wcjang@snu.ac.kr
Acknowledgements Dr. Roger Peng Coursera course. https://github.com/rdpeng/courses Creative Commons by Attribution /.
10 : SNS (twitter, facebook), (functional data) : (, ),, /Data Science ( )
- (Machine Learning), ( ) ( ), ( )
: - University of Washington Center for Statistics and Social Sciences curriculum : Statistical Machine Learning :
6 : - 9 : - 10 : - 11 : - 12 : - 2015 3 Workshop
University of Washington Center for Statistics and Social Sciences! 1999 - The Center for Statistics and the Social Sciences (CSSS) promotes collaborative interdisciplinary research on statistical methods for the social sciences, and teaches a rich menu of courses for social science students 5 core faculty member (Faculty Affiliates) (Statistical consulting for the social sciences) Math Camp - http://www.csss.washington.edu/mathcamp/ Lectures/ Seed Grant Weekly seminar
CSSS courses Structural Equation Models Survey Research Methods Sample Survey Techniques Analysis of Categorical and Count Data Event History Analysis Hierarchical Modeling for the Social Sciences Bayesian Statistics for the Social Sciences Causal Modeling Statistical Analysis of Social Network Visualizing Data
( )! Bayesian Statistics Graphical Model (Structural Equation Model) Item Response Theory Multilevel/Hierarchical Modeling Social Network Analysis
new trend Computing - R Data Visualization LASSO Independent Component Analysis (Reproducible Research) Topic model in Textmining Exponential Random Graph model in Social Network Analysis
R Commander/R Studio (Excel/ SAS ) 4 ( ) (..) / code code sharing -
R? Data Analysts Captivated by R s Power (NYTimes 1/6/2009)
trend : http://r4stats.com/2012/05/09/beginning-of-the-end/
R? R. - :,,,,
R Screenshot : http://www.r-project.org
R Commander R commander McMaster John Fox R R GUI (Graphical User Interface). R commander R console install. >install.package( Rcmdr ) >library(rcmdr)
R commander Screenshot : http://socserv.mcmaster.ca/jfox/misc/rcmdr/rcmdrscreenshot.html
R Commander Drop down menu Toolbar Script window: type output windows : output : Message windows :error message :warnings :
RStudio R (Integrated Development Environment) R beta.rstudio.org gmail account. www.rstudio.com download
R Studio Screenshot : http://www.rstudio.com/products/rstudio/features/
Growth in a Time of Debt : http://www.businessweek.com/articles/2013-04-18/ economists-spreadsheet-error-upends-the-debt-debate
Excel error that changed History 2010 Reinhart Rogoff government debt gross domestic product 90% Growth in a Time of Debt ( 90% threshold ) UMass Reinhart Rogoff 5. -0.1% 2.2%.
Reproducible Research (, ) code
Repeatability of published microarray gene expression analyses Nature genetics 2005 1 2006 12 2 microarray 56 18 4 / 2
: 2 : 6 : 10
2011 Science Data Replication& Reproducibility special issue Reproducible Research in Computational Science, R.D. Peng, Improving Validation Research in Omics Research, J.P.A. Ioannidis and M.J. Khoury
(SAS, SPSS) format (analysis1.r, analysis2.r, final.r, Real-final.R) ( ) supplemental materials
The Art of Programing : http://geekandpoke.typepad.com/geekandpoke/2008/02/the-art-ofprog.html
Markdown Git R package Knitr
Statistical (Machine) Learning Prediction, Prediction, Prediction!!!! (Penalized Regression): Lasso, Ridge Text Mining: Sentiment Analysis, Topic Model Supervised Learning: Support Vector Machine, Neural Network, Tree Method, Logistic Regression Unsupervised Learning: Principle Component Analysis, Independent Component Analysis, K-means++ Other Learning: Semi-supervised Learning, Online Learning, Deep Learning, Active Learning
Introduction to Statistical Learning : http:// online.stanford.edu/course/statisticallearning-winter-2014 Download PDF at http://wwwbcf.usc.edu/~gareth/isl/
Privacy in Big data! Netflix (1st) Competition Recommend system open competition 48 17,770 100 training set test set Netflix 2nd competition cancellation (IMDb vs Netflix)
Privacy Privacy-Preserving Data Mining Statistical Disclosure Limitation Data Masking/Jittering Bias Variance
UN Global Pulse
Sending the Police Before There s Crime (NYTimes, 8/15/2011)
Global Pulse http://www.unglobalpulse.org/ UN global development SNS Job loss prediction disease outbreak prediction
- Selection bias -?
? (multiple comparison) A 250 11 5%.? (John Tukey, 1976)
? -, (, 2013/3/ 6) - selection bias :
,, (, ), ( ).