chap6_basic_association_analysis PART1 ver2

Similar documents
PowerPoint Presentation

chap6_basic_association_analysis PART2 ver2

PowerPoint 프레젠테이션

<근대이전> ⑴ 문명의 형성과 고조선의 성립 역사 학습의 목적, 선사 문화의 발전에서 국가 형성까지를 다룬다. 역사가 현재 우리의 삶과 긴밀하게 연결되었음을 인식하고, 역사적 상상력을 바탕으 로 선사 시대의 삶을 유추해 본다. 세계 여러 지역에서 국가가 형성되고 문 명

Microsoft PowerPoint Relations.pptx

untitled

Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997

Microsoft PowerPoint - 27.pptx

Microsoft PowerPoint - 26.pptx

10-2 삼각형의닮음조건 p270 AD BE C ABC DE ABC 중 2 비상 10, 11 단원도형의닮음 (& 활용 ) - 2 -

6자료집최종(6.8))

본문01

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

(01~80)_수완(지학1)_정답ok

2005년 6월 고1 전국연합학력평가

À±½Â¿í Ãâ·Â

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

9장. 연관규칙분석과 협업필터링

Microsoft PowerPoint - 알고리즘_11주차_2차시.pptx

chap 5: Trees

9장. 연관규칙분석과 협업필터링

02김헌수(51-72.hwp

지능정보연구제 16 권제 1 호 2010 년 3 월 (pp.71~92),.,.,., Support Vector Machines,,., KOSPI200.,. * 지능정보연구제 16 권제 1 호 2010 년 3 월

Getting Started

2002년 2학기 자료구조

( )EBS문제집-수리

김기남_ATDC2016_160620_[키노트].key

example code are examined in this stage The low pressure pressurizer reactor trip module of the Plant Protection System was programmed as subject for

2004math2(c).PDF

step 1-1

歯Ky2002w.PDF

기본서(상)해답Ⅰ(001~016)-OK

public key private key Encryption Algorithm Decryption Algorithm 1

,. 3D 2D 3D. 3D. 3D.. 3D 90. Ross. Ross [1]. T. Okino MTD(modified time difference) [2], Y. Matsumoto (motion parallax) [3]. [4], [5,6,7,8] D/3

adfasdfasfdasfasfadf


A n s w e r % ml g/cm 1.8 kg B E A C LNGLPGLNG LPG 15 << 13 A<

Vol.258 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

ps

Microsoft PowerPoint - Analyze

16(1)-3(국문)(p.40-45).fm

MS-SQL SERVER 대비 기능

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

초보자를 위한 분산 캐시 활용 전략

DIY 챗봇 - LangCon

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

PowerPoint 프레젠테이션

Journal of Educational Innovation Research 2016, Vol. 26, No. 3, pp DOI: Awareness, Supports

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

Introduction to Statistics (Fall, 2018) Chapter 2 Introduction to Probability Chapter 2 Introduction to Probability 2.1 Overview 확률 ( 론 ) 은우연에따라좌우되는게임

Buy one get one with discount promotional strategy

?

Page 2 of 6 Here are the rules for conjugating Whether (or not) and If when using a Descriptive Verb. The only difference here from Action Verbs is wh

자연언어처리

Microsoft PowerPoint - 알고리즘_5주차_1차시.pptx

11 템플릿적용 - Java Program Performance Tuning (김명호기술이사)

May 10~ Hotel Inter-Burgo Exco, Daegu Plenary lectures From metabolic syndrome to diabetes Meta-inflammation responsible for the progression fr

PowerPoint 프레젠테이션

untitled

(01-16)유형아작중1-2_스피드.ps

untitled

歯15-ROMPLD.PDF


<C3D1C1A4B8AE B0E6BFECC0C720BCF B9AE2E687770>

DBPIA-NURIMEDIA

하반기_표지

<3130C0E5>


BSC Discussion 1

정보기술응용학회 발표

°í¼®ÁÖ Ãâ·Â

Yggdrash White Paper Kr_ver 0.18

2011´ëÇпø2µµ 24p_0628

step-2-1

2004math2(a).PDF

2 : (JEM) QTBT (Yong-Uk Yoon et al.: A Fast Decision Method of Quadtree plus Binary Tree (QTBT) Depth in JEM) (Special Paper) 22 5, (JBE Vol. 2

04 형사판례연구 hwp

#Ȳ¿ë¼®

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi

Gray level 변환 및 Arithmetic 연산을 사용한 영상 개선

chap 5: Trees

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

Vol.257 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

<33312D312D313220C0CCC7D1C1F820BFB0C3A2BCB12E687770>

화판_미용성형시술 정보집.0305

1217 WebTrafMon II

10주차.key

목차 BUG 문법에맞지않는질의문수행시, 에러메시지에질의문의일부만보여주는문제를수정합니다... 3 BUG ROUND, TRUNC 함수에서 DATE 포맷 IW 를추가지원합니다... 5 BUG ROLLUP/CUBE 절을포함하는질의는 SUBQUE


강의10

C# Programming Guide - Types

λx.x (λz.λx.x z) (λx.x)(λz.(λx.x)z) (λz.(λx.x) z) Call-by Name. Normal Order. (λz.z)

제 9 도는 6제어항목의 세팅목표의 보기가 표시된 레이더 챠트(radar chart). 제 10 도는 제 6 도의 함수블럭(1C)에서 사용되는 각종 개성화 함수의 보기를 표시하는 테이블. 제 11a 도 제 11c 도까지는 각종 조건에 따라 제공되는 개성화함수의 변화의

K7VT2_QIG_v3

DBPIA-NURIMEDIA

목차 BUG offline replicator 에서유효하지않은로그를읽을경우비정상종료할수있다... 3 BUG 각 partition 이서로다른 tablespace 를가지고, column type 이 CLOB 이며, 해당 table 을 truncate

Slide 1

304.fm

<313120C0AFC0FCC0DA5FBECBB0EDB8AEC1F2C0BB5FC0CCBFEBC7D15FB1E8C0BAC5C25FBCF6C1A42E687770>

09김정식.PDF

Transcription:

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6

Contents Association Rule Mining 2

Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk}, Implication means co-occurrence, not causality!

Definition: Frequent Itemset 항목집합 (Itemset) A collection of one or more items u Example: {Milk, Bread, Diaper} k-itemset u An itemset that contains k items 지지도카운트 (Support count) (s) 특정항목집합을포함하는 transaction 수 예 : s({eggs}) = 1, s({milk, Bread,Diaper}) = 2 지지도 (Support) s 항목집합이나타나는트랜잭션의비율 예 : s{eggs} = 1/5 = 0.2 s({milk, Bread, Diaper}) = 2/5 빈발항목집합 (Frequent Itemset) 지지도가주어진임계치 minsup 보다큰 항목집합 2-itemset 4-itemset TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke 만약 minsup = 0.3이라면, {Eggs} 은빈발하지않으며, {Milk, Bread, Diaper} 은빈발함

Definition: Association Rule 연관규칙이란? X와 Y가항목집합일때, X Y 형태로나타나는함축표현 (implication expression) 예 : {Milk, Diaper} {Bread} 연관규칙의평가척도 s ( X ÈY) sx ( Y) = 지지도 (Support): s N - X와 Y를함께포함하는트랜잭션비율 - 규칙이얼마나중요하며, 유용한지를알수있음 ( 낮은 TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Support Count: s ( X) = { t X Ít, t ÎT} T = { t, t, t,..., t } 는모든 transaction 집합 1 2 3 N i i i 지지도갖는규칙은우연으로볼수있음 ) s ( X ÈY) cx ( Y) = 신뢰도 (Confidence): c s ( X ) Example: { Milk, Diaper} Þ Beer - X를포함한트랜잭션중에 Y가나타나는비율 - 규칙이얼마나믿을만한가? ( 가정과결론이얼마나타이트한관련성있는지를나타냄 ) (Milk,Diaper,Beer) s = s = T s (Milk,Diaper,Beer) c = = s (Milk,Diaper) 2 3 2 5 = 0.67 = 0.4

Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having Support minsup threshold Confidence minconf threshold Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Þ Computationally prohibitive! 최소지지도 (Minsup: minimum support) 최소신뢰도 (Minconf: minimum confidence) 연관규칙마이닝 = 왼쪽조건을만족하는모든규칙을찾는작업

Mining Association Rules TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Rules: {Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5) Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence Ø 이때문에, 지지도 (support) 와신뢰도 (confidence) 를분리하여마이닝할필요있음

Contents Frequent Itemset Generation & Apriori Algorithm 8

Mining Association Rules Transaction DB 에서연관규칙을찾아야함 연관규칙을만들기위한 2 단계접근법 (Two-step approach): 1. 빈발항목집합생성 (Frequent Itemset Generation) Transaction DB 에서많이존재하는빈발항목집합을찾아야함 빈발항목집합 : support ³ minsup 을만족하는항목집합 2. 연관규칙생성 (Rule Generation) 각빈발항목집합을두개의항목집합으로분리하여 confidence ³ minconf 를만족하는연관규칙들을도출함 (Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset) 여기서, Transaction DB 에서빈발항목집합을찾아내는과정은 computationally expensive 함

Frequent Itemset Generation l Lattice 기반항목집합열거 아래의 lattice( 격자 ) 구조는모든가능한항목집합목록열거에도움됨 I={a,b,c,d,e} 에대한항목집합격자. D개의항목을포함하는데이터집합은 2 d -1개의빈발항목집합생성가능 null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2 d possible candidate itemsets (The number of subsets)

Frequent Itemset Generation Brute-force approach: Lattice 의모든 itemset 은 candidate frequent itemset 이됨 Transaction 데이터베이스를스캔하면서, 각후보에대해 support 계산및카운트함 Transactions List of Candidates TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke N M 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke 모든후보에대해각 transaction 을매치함 복잡도 ~ O(NMw) => Expensive since M = 2 d!!! w ( M:candidate frequent itemset 의개수, N:Transaction 수, w: 최대 transaction 폭 )

Computational Complexity 항목이 d 개주어졌을때, 가능한항목집합의개수 = 2 d 가능한연관규칙의개수 = 3 d - 2 d+1 + 1 1 2 3 1 1 1 1 + - = ú û ù ê ë é ø ö ç è æ - ø ö ç è æ = + - = - = å å d d d k k d j j k d k d R If d=6, R = 602 rules

Frequent Itemset 생성전략 Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M Reduce the number of transactions (N) Reduce size of N as the size of itemset increases Using the DHP(Direct Hashing & Pruning) and vertical-based mining algorithms Reduce the number of comparisons (NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction

Reducing Number of Candidates Apriori principle: 어떤항목집합이빈발하다면, 그항목집합의모든부분집합도빈발함 예 : {Milk, Bread, Diaper} 가빈발항목집합이면, 이의부분집합인 {Milk, Bread}, {Bread, Diaper} 등도빈발항목집합이다. Apriori principle holds due to the following property of the support measure: " X, Y : ( X Í Y ) Þ s( X ) ³ s( Y ) 어떤항목집합의지지도는그부분집합들의지지도를넘을수없다! u즉, 어떤짧은서브패턴이자주나오지않는다는것을알고있다면, 이서브패턴에서아이템이더붙어서나오는수펀패턴도더자주나오지않음 이는지지도가 anti-monotone 성질을가지기때문이다. (a > b è f(a) < f(b))

Illustrating Apriori Principle null A B C D E AB AC AD AE BC BD BE CD CE DE 만약 {a,b} itemset 이빈발하지않는다면, 이를포함한 superset 도빈발하지않음 ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Pruned supersets 제거됨 ABCDE

Illustrating Apriori Principle Item Count Bread 4 Coke 2 Milk 4 Beer 3 Diaper 4 Eggs 1 (1) Coke 과 Eggs 는상대적으로빈도가적으므로 drop 됨 (3 이하이면 drop) Minimum Support = 3 Items (1-itemsets) If every subset is considered, 6 C 1 + 6 C 2 + 6 C 3 = 41 (Brute-force) With support-based pruning, 6 + 6 + 1 = 13 로줄어듦 (A Priori) (2) Frequent itemset 의항목이 (1) 에서 4 개이므로, 여기서 2 항목 itemset 을만듦. 4C2 = 6 Itemset Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Pairs (2-itemsets) (3) 다시 3 이하인, 두개 itemset 을 drop 한후, 3 itemset 을만듦 Triplets (3-itemsets) Itemset Count {Bread,Milk,Diaper} 3

Illustrating Apriori Principle Database Tid Sup min = 2 Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E 1 st scan Itemset L {A} 2 1 C 1 {B} 3 C 2 C 2 {A, B} 1 L 2 Itemset sup 2 nd scan {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 C Itemset 3 3 rd scan L 3 {B, C, E} Candidate itemset Itemset {A, C, E} 는없음 Why? {A,E} 가 L2에없으므로 {B, C, E} 는있음. {B,C} {C,E} {B,E} 모두 L2에있으므로 Frequent 3-itemset 17 생성 sup {C} 3 {D} 1 {E} 3 sup {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {B, C, E} 2 Frequent itemset Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E}

The Apriori Algorithm (Pseudo-Code) C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k!=æ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do L k+1 end increment the count of all candidates in C k+1 that are contained in t return È k L k ; = candidates in C k+1 with min_support 18

Implementation of Apriori How to generate candidates? Step 1: self-joining L k C k+1 = candidates generated from L k ; Step 2: pruning Example of Candidate-generation L 3 ={abc, abd, acd, ace, bcd} Step 1: self-joining: L 3 *L 3 uabcd from abc and abd uacde from acd and ace Step 2: Pruning: uacde is removed because ade is not in L 3 C 4 = {abcd} 19

Reducing Number of Comparisons Candidate counting: 빈발항목이라고결정내리기위해선먼저 candidate itemset 에서 support 값을계산해야함 Support 값계산을위해선 Transaction DB 와모두비교필요 à 높은계산복잡도 à Candidate 을 hash structure 에저장하여효율적비교 u Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets u That is, we use a hash tree structure to reduce the number of candidates in C that are checked for a data-sequence Transactions Hash Structure N TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke k Buckets

How to Count Supports of Candidates? Why counting supports of candidates a problem? The total number of candidates can be very huge One transaction may contain many candidates Method: Candidate itemsets are stored in a hash-tree Leaf node of hash-tree contains a list of itemsets and counts Interior node contains a hash table Subset function(hash function): finds all the candidates contained in a transaction 21

Support Counting( 지지도계산 ) 방식 Transaction t ={1,2,3,5,6} 에속한 3-itemset 을모두열거하는체계적방법 Transaction 으로부터 3-itemset 을열거한후, 후보항목 (Candidate list) 각각과비교하여, 존재하면해당후보항목지지도카운트는증가됨 Transaction, t 1 2 3 5 6 Level 1 1 2 3 5 6 2 3 5 6 3 5 6 Level 2 1 2 3 5 6 1 3 5 6 1 5 6 2 3 5 6 2 5 6 3 5 6 1 2 3 1 2 5 1 2 6 1 3 5 1 3 6 1 5 6 2 3 5 2 3 6 2 5 6 3 5 6 Level 3 Subsets of 3 items

Support Counting using a Hash Tree Candidate itemset 이주어지면 à Hash tree 에서서로다른 bucket 에나눠저장됨 이때지지도계산시, 각 transaction 에포함된 itemset 도그것들에적합한 bucket 들로해쉬됨 즉, transaction 으로부터 itemset 을 hash function 규칙에따라 hash tree 의각 leaf 노드로할당함 이로써, transaction 에속한각 itemset 을각 itemset 마다비교하는대신, 아래그림처럼오직같은 bucket 에 candidate itemset 과비교함 (1) Candidate itemset 은 hash tree 에나눠저장됨 (2) Transaction ID:2,3,4 는 Hash Tree 의왼쪽 leaf node (bucket) 에만비교됨

Support Counting using a Hash Tree Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8} You need: Hash function Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node) 15 개의 candidate itemsets 이 leaf node 에할당됨 Hash function 3,6,9 1,4,7 2,5,8 1 4 5 1 2 4 4 5 7 1 2 5 4 5 8 2 3 4 5 6 7 3 4 5 3 5 6 1 3 6 3 5 7 6 8 9 1 5 9 3 6 7 3 6 8

Support Counting using a Hash Tree Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8} Hash function 3,6,9 1,4,7 2,5,8 candidate itemsets 에서 1,4,7 로시작되는것은 A 가지에모두할당되어야함 즉, {1 4 5} {1 2 4} {4 5 7} {1 2 5 } {4 5 8} { 1 5 9} {1 3 6}. 총 7 개의 itemset 이 A 가지에할당됨 각 leaf 노드에최대 3 개만할당가능하다면 Hash Tree 는 depth 를늘림 à B tree 추가됨 B tree 에서는두번째값이 2,5 8 것을갖게되고, 세번째값에따라왼쪽, 가운데, 오른쪽 node 로분배됨 1 4 5 1 2 4 4 5 7 A 1 2 5 4 5 8 2 3 4 5 6 7 3 4 5 3 5 6 1 3 6 3 5 7 6 8 9 1 5 9 B 3 6 7 3 6 8

Hash tree 주어진 candidate itemset 을 hash tree 에할당하는방법 Candidate itemset 을 Hash tree 의 level 1 에할당할때는첫번째 item 에의해결정됨 Level 2 의 leaf node 에할당할때는두번째 item 에의해결정됨 Candidate Hash Tree Hash Function 1,4,7 3,6,9 2,5,8 2 3 4 5 6 7 Hash on 2, 5 or 8 1 4 5 1 3 6 1 2 4 1 2 5 1 5 9 4 5 7 4 5 8 3 4 5 3 5 6 3 6 7 3 5 7 3 6 8 6 8 9

Hash tree Hash Function Candidate Hash Tree 1,4,7 3,6,9 2,5,8 Hash on 3, 6 or 9 1 4 5 1 3 6 1 2 4 1 2 5 1 5 9 4 5 7 4 5 8 2 3 4 5 6 7 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8

Hash tree Hash Function Candidate Hash Tree 1,4,7 3,6,9 2,5,8 Hash on 1, 4 or 7 1 4 5 1 3 6 1 2 4 1 2 5 1 5 9 4 5 7 4 5 8 2 3 4 5 6 7 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8

Support Counting using a Hash Tree Transaction {1 2 3 5 6} 은아래처럼됨 1 2 3 5 6 transaction Hash Function 1 + 2 3 5 6 2 + 3 5 6 1,4,7 3,6,9 3 + 5 6 2,5,8 2 3 4 5 6 7 Transaction 의 1,2,3 은각각트리의왼쪽, 중간, 오른쪽방문을의미함 (Hash funciton) 1 4 5 1 3 6 1 2 4 1 2 5 1 5 9 4 5 7 4 5 8 3 4 5 3 5 6 3 6 7 3 5 7 3 6 8 6 8 9

Support Counting using a Hash Tree Transaction 의 12, 13, 15 에서, 2,3,4 는각각중간, 오른쪽, 중간가지를의미함 (Hash funciton) Transaction 의 1,2,3 은왼쪽, 중간, 오른쪽가지. 1 2 3 5 6 transaction Hash Function 1 2 + 1 3 + 3 5 6 5 6 1 + 2 3 5 6 2 + 3 5 6 3 + 5 6 1,4,7 2,5,8 3,6,9 1 5 + 6 2 3 4 5 6 7 356 으로부터 5 는중간가지임 Transaction 의 12 + {3,5,6} 으로부터, 3,5,6 은각각오른쪽, 중간, 오른쪽가지이며, 15+6 으로부터 6 오른쪽임 1 4 5 1 3 6 1 2 4 4 5 7 1 2 5 4 5 8 1 5 9 3 4 5 3 5 6 3 6 7 1+{2 3 5 6} 에서 13 의 3 으로부터오른쪽가지임을앎 3 5 7 6 8 9 3 6 8

Support Counting using a Hash Tree 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8} 1 2 3 5 6 transaction Hash Function 1 2 + 1 3 + 3 5 6 5 6 1 + 2 3 5 6 2 + 3 5 6 3 + 5 6 1,4,7 2,5,8 3,6,9 1 5 + 6 2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 * 1 2 5 4 5 8 * 1 5 9 3 4 5 3 5 6 3 6 7 3 5 7 6 8 9 * 3 6 8 9 개의 leaf node 에서 5 개의 node 를방문 15 개의 candidate itemset 에서 9 개가 transaction 과비교됨 à 실제 5 개 count 증가됨