六 6.1 裡 不 列 1. Converge Webpage Ranking Algorithms 數 數 兩 不 裡 利 PageRank 來 數 2. Fixed dataset 不 料 6.1 3. Personal website identification 率 2 來 兩 數 率 1. 數 力 10 論 力 論 力 1~5 論 力 數 2. 率 4 4 率 76
不 Recall 來 量 論 數 property Precision 6.1 PageRank (PR) Basic Strategy TimedPageRank (TPR) HITS PageRank (PR) SA TimedPageRank (TPR) Asynchronous Ranking (Asyn) HITS PageRank (PR) AS TimedPageRank (TPR) HITS PageRank (PR) Synchronous Ranking (Syn) TimedPageRank (TPR) HITS 6.2 CPU Intel Pentium 4 2.4G 2G 100MB 路 了 Data mining Association rules Database Pattern recognition 了 6.2.4 論 SIGMOD Record 裡 力 論 77
6.1 PageRank 6.2 6.2.1 Converge PageRank[9] 裡 6.1 322000000 100 [9] 裡 料 2900 論 數 0.00001 6.2 16 0.00001 20 來 行 6.1 2 78
6.2.2 Fixed Dataset Data mining Association rules Database Pattern recognition 利 ranking 力 10 論 6.2 論 Data mining Association rules Database Pattern recognition score precision score precision score precision score precision PR 31 0.5 35 0.6 36 0.6 37 0.7 TP R 31 0.4 43 0.9 39 0.8 37 0.7 H ITS 17 0 32 0.5 31 0.4 37 0.7 Asyn(SA)+PR 43 0.8 35 0.4 32 0.4 39 0.9 Asyn(SA)+TPR 44 0.8 33 0.4 34 0.5 37 0.7 Asyn(SA)+HITS 14 0 32 0.5 30 0.4 37 0.7 Asyn(AS)+PR 30 0.5 27 0.2 29 0.3 38 0.8 Asyn(AS)+TPR 28 0.4 30 0.3 36 0.4 35 0.5 Asyn(AS)+HITS 17 0 32 0.5 30 0.4 34 0.7 Syn+PR 35 0.6 40 0.6 30 0.6 39 0.9 Syn+TPR 14 0.1 29 0.3 35 0.6 36 0.6 Syn+HITS 17 0 32 0.5 32 0.3 36 0.6 6.3 量 Data mining Association rules Database Pattern recognition score precision score precision score precision score precision P R 36 0.6 39 0.6 38 0.6 44 0.8 T PR 42 0.8 40 0.6 42 0.7 40 0.7 HITS 29 0.3 36 0.6 26 0.2 37 0.5 Asyn(SA)+PR 36 0.6 40 0.6 36 0.5 35 0.4 Asyn(SA)+TPR 36 0.6 40 0.6 38 0.5 34 0.3 Asyn(SA)+HITS 35 0.6 44 0.8 32 0.3 30 0.2 Asyn(AS)+PR 36 0.6 40 0.6 34 0.5 35 0.4 Asyn(AS)+TPR 36 0.6 40 0.6 34 0.5 34 0.3 Asyn(AS)+HITS 36 0.7 41 0.7 31 0.3 36 0.5 Syn+PR 38 0.6 42 0.7 36 0.4 32 0.1 Syn+TPR 37 0.5 37 0.6 35 0.4 41 0.8 Syn+HITS 28 0.3 36 0.7 33 0.4 38 0.6 79
來 論 6.2 6.3 6.4 6.5 6.4 量 Data mining Association rules Database Pattern recognition score precision score precision score precision score precision P R 42 0.8 41 0.7 34 0.4 42 0.8 TP R 38 0.7 41 0.7 31 0.3 37 0.5 H ITS 34 0.6 34 0.5 31 0.4 38 0.7 Asyn(SA)+PR 37 0.6 41 0.7 34 0.4 42 0.8 Asyn(SA)+TPR 37 0.7 41 0.7 35 0.3 38 0.6 Asyn(SA)+HITS 34 0.6 34 0.5 32 0.4 40 0.7 Asyn(AS)+PR 41 0.8 40 0.6 39 0.5 31 0.1 Asyn(AS)+TPR 42 0.8 40 0.6 39 0.5 35 0.5 Asyn(AS)+HITS 39 0.7 38 0.6 39 0.6 34 0.4 Syn+PR 43 0.9 46 0.9 37 0.5 35 0.4 Syn+TPR 31 0.5 43 0.8 36 0.6 40 0.8 Syn+HITS 33 0.6 33 0.5 37 0.5 37 0.6 6.5 量 Data mining Association rules Database Pattern recognition score precision score precision score precision score precision P R 39 0.7 41 0.6 35 0.4 40 0.7 TP R 39 0.7 41 0.6 36 0.4 40 0.7 H ITS 35 0.5 37 0.5 34 0.4 39 0.7 Asyn(SA)+PR 38 0.6 41 0.6 35 0.4 40 0.7 Asyn(SA)+TPR 39 0.7 41 0.6 36 0.4 40 0.7 Asyn(SA)+HITS 35 0.5 37 0.5 35 0.4 39 0.7 Asyn(AS)+PR 39 0.7 44 0.8 36 0.4 38 0.7 Asyn(AS)+TPR 40 0.8 44 0.8 34 0.3 36 0.5 Asyn(AS)+HITS 39 0.8 42 0.7 34 0.4 34 0.5 Syn+PR 37 0.6 44 0.8 35 0.4 38 0.6 Syn+TPR 35 0.5 39 0.6 36 0.4 37 0.5 Syn+HITS 37 0.7 37 0.5 34 0.3 38 0.6 80
Database 不理 Database 了 來 度 異 量 不 來 1. Data mining Synchronous Ranking + PageRank 2. Association rules Synchronous Ranking + PageRank 6.6 1 Bing Liu http://www.cs.uic.edu/~liub/ 2 Christos Faloutsos http://www-2.cs.cmu.edu/~christos/ 3 Divesh Srivastava http://www.research.att.com/~divesh/ 4 Hector Garcia-Molina http://www-db.stanford.edu/people/hector.html 5 Heikki Mannila http://www.cs.helsinki.fi/~mannila/ 6 Jeffrey F.Naughton http://www.cs.wisc.edu/~naughton/naughton.html 7 Jian Pei http://www.cse.buffalo.edu/faculty/jianpei/ 8 Jiawei Han http://www.cs.sfu.ca/~han/ 9 Johannes Gehrke http://www.cs.cornell.edu/johannes/ 10 Laks V.S.Lakshmanan http://www.cs.ubc.ca/~laks/ 11 Michael J.Carey http://www.informatik.uni-trier.de/~ley/db/indices/ a-tree/c/ Carey:Michael J=.html 12 Michael Stonebraker http://epoch.cs.berkeley.edu:8000/nasa_e2e/mike.html 13 Mohammed Javeed Zaki http://www.cs.rpi.edu/~zaki/ 14 Padhraic Smyth http://www.ics.uci.edu/~smyth/ 15 Philip S.Yu http://www.research.ibm.com/people/p/psyu/ 16 Rakesh Agrawal http://www.almaden.ibm.com/cs/people/ragrawal/ 17 Ramakrishnan Srikant http://www.almaden.ibm.com/cs/people/srikant/ 18 Surajit Chaudhuri http://research.microsoft.com/~surajitc/ 19 Usama M.Fayyad http://www-aig.jpl.nasa.gov/mls/home/fayyad/ 20 Wynne Hsu http://www.comp.nus.edu.sg/~whsu/ 81
3. Database Basic Strategy + TimedPageRank 4. Pattern recognition Basic Strategy + PageRank 6.2.3 Personal Website Identification 20 利 6.6 欄 來 19 率 0.95 11 DBLP 列 了 料 不 料 6.7 Data mining 力 論 1 Fast Algorithms for Mining Association Rules 2 Fast Algorithms for Mining Association Rules in Large Databases 3 High-Dimensional Similarity Joins 4 Information Sharing Across Private Databases 5 Mining Association Rules with Item Constraints 6 Mining Generalized Association Rules 7 Mining Quantitative Association Rules in Large Relational Tables 8 Mining Sequential Patterns 9 Mining Sequential Patterns:Generalizations and Performance Improvements 10 Privacy-Preserving Data Mining 6.8 Data mining 力 1 Carlo Zaniolo 2 Hector Garcia-Molina 3 Heikki Mannila 4 Jiawei Han 5 Martin C.Rinard 6 Mohammed Javeed Zaki 7 Raghu Ramakrishnan 8 Rakesh Agrawal 9 Ramakrishnan Srikant 10 Serge Abiteboul 82
6.2.4 Summary of Experiments 來 Data mining 例 力 來 論 利 Basic Strategy PageRank 力 論 6.7 力 6.8 力 6.9 力 6.10 力 論 1 Very Large Data Bases 2 SIGMOD Conference 6.9 Data mining 力 3 Knowledge Discovery and Data Mining 4 International Conference on Data Engineering 5 Extending Database Technology 6 Conference on Parallel and Distributed Information Systems 7 International Conference on Information and Knowledge Management 8 IEEE International Conference on Data Mining 9 Research Issues on Data Mining and Knowledge Discovery 10 Symposium on Principles of Database Systems 6.10 Data mining 力 1 IEEE Transactions on Knowledge and Data Engineering 2 Data Mining and Knowledge Discovery 3 Information Systems 4 Machine Learning 5 Journal of Parallel and Distributed Computing 6 Communications of the ACM 7 VLDB Journal 8 ACM Transactions on Database Systems 9 Data and Knowledge Engineering 10 SIGMOD Record 83
(Topic Shift) 不 Data mining 來 例 4 6 論 不 理 論 1. 不 例 利 力 論 來 論 不 2. Citation 利 來 論 論 不 [46] 裡 不 類 了 論 參 3. 來 Data mining 力 料 Database 不 不 Data mining Database 例 兩 度 連 論 論 來 SIGMOD Record 1998 年 都 來說 力 論 了 42 了 39 力 論 1990 年 論 6.11 6.11 欄 論 論 欄 年 SIGMOD Record Database 利 Database 來 SIGMOD Record 來 10 6.11 4 11 兩 84
兩 1. SIGMOD Record 力 論 年 都 1990 年 15 數 不 2. SIGMOD Record 力 論 Database Database 6.11 SIGMOD Record 力 論 1 Regular Tree and Regular Hedge Languages over Unranked Alphabets (F. 01 Neven) 2 String B-tree: A New Data Structure for String Search in External Memory and 99 its Applications (N. Koudas) 3 First-order Queries on Finite Structures over the Reals (L. Libkin) 98 4 Automatic Subspace Clustering of High Dimensional Data for Data Mining 98 Applications (W. Wang) 5 Online Aggregation (F. Korn) 97 6 Transactional Client-Server Cache Consistency: Alternatives and Performance 97 (K. Vorugan) 7 Outerjoin Simplification and Reordering for Query Optimization (J. Rao) 97 8 Combining Fuzzy Information from Multiple Systems (L. Gravano) 96 9 The space complexity of approximating the frequency moments (M. 96 Garofalakis) 10 On the power of languages for the manipulation of complex values (T. Milo) 95 11 Fast algorithms for mining association rules (J. Han) 94 12 Functional Database Query Languages as Typed Lambda Calculi of Fixed 94 Order (D. Suciu) 13 Byte-aligned Bitmap Compression (T. Johnson) 94 14 ARIES: A Transactino Recovery Method Supporting Fine-Granularity Locking 92 and Partial Rollbacks Using Write-Ahead Logging (B. Salzberg) 15 Reliable Transaction Management in a Multidatabase System (P. Scheuermann) 90 85