Math. Statistics: Statistics? 1 What is Statistics? 1. (collection), (summarization), (analyzing), (presentation) (information) (statistics).., Survey, :, : : QC, 6-sigma, Data Mining(CRM) (Econometrics) :, 1.1 Describe the characteristics of interest in a population: (Descriptive statistics ) Make an inference(estimation and hypothesis testing) about a population based an information contained in a sample and to provide an associated measure of goodness for the inference: (Inferential statistics ) 1. Statistics is a theory of information, with inference making as its objective. A branch of mathematics dealing with the collection, analysis, interpretation, and the presentation of masses of numerical data, Webster Dictionary The branch of the scientific method which deals with data obtained by counting or measuring the properties, Kendall and Stuart The technology of scientific method, the design of experiment and investigation, statistical inference, Mood et al.
Math. Statistics: Statistics? Phenomenon( ) Data Collecting( ) Data: numbers with context, numbers that have a story ( ) [ ] Numbers with context Summarizing( ) Analyzing( ), Modeling() Prediction / Forecasting Graphical and Numerical Presenting( ) Information( ): A numerical measure or verbal explanation of the uncertainty about the population of interest. When information is used for understanding or doing something, it is known as knowledge( ). (decision making)
Math. Statistics: Statistics? 3 1.. : : 6 Tullius (BC 534-378) 5 ( ) : Caesar (BC 5 ) :. : 17c /, F. Nightingale(1854, Polar Area Diagram, ) (Descriptive Statistics). (: 5) A 1960-000 (census) censura(= ). Statistics status(= )... Fermat & Pascal: Arithmetic Triangle Gauss: Gauss Galton: ( ) Gosset: t- Fisher: (population) (sample) ( ) (Inferential Statistics).?, 017? 100 1,000
Math. Statistics: Statistics? 4 1.3. William Thompson-. ( ). / 75%. There is 3 lies in the world: lie, damn lie, and Statistics Statistics can prove everything. Benjamin Disreal- A famous joke in Statistics (exit poll)? BUT,. ( 1)(4%), (33.8%), (16.3%) ( ) : 5 5 4 4 4 3 3 3 1 ( 3.7 4.0) ( 3) K, ( 4) 30 70%,.? 300%.? 1 (?)??
Math. Statistics: Statistics? 5 1.4 Statistical Terms (population & sample) (population) (sample). Population(): (,, ),. Sample( ): miniature. (parameter). Sampling frame, census vs. sample survey (/). (measure, metric, quantitative) :,,,,, ( : ). (classified, categorical, non-metric, qualitative) :. (ordinal):, (,, ), (,, ), (A, B, C,..). (nominal):, (,, ). (sampling error) (sampling error)..? 1500 0.45. 0.45 ( ).. (SRS: simple random sampling),. z 0.5 / n α / ( ) 95%, 1500 z α 0.5 / n = 1.96 0.5 /1500 0. 053.5% / =
Math. Statistics: Statistics? 6 1.4. (,,,, )., (). (parameter) (model,, ). [ 1]. IQ. IQ 10. (IQ)? : IQ µ = 10. µ. t -. [ ]. (,,, ),,. = f ( 1, ).. ( ) 1) (statistical hypothesis), (model) ) 3) (significance). Confirmatory() Data Analysis. (deduction reasoning). Popper(11955). 1977 John W. Tukey (EDA: Exploratory Data Analysis). (1) (,, ) (stem-leaf plot, box plot, scatter plot) () ( ) (re-expression = data transformation : log ). (inductive). [ ] IQ IQ IQ
Math. Statistics: Statistics? 7 [ ] IQ IQ IQ 1.5 1.5.1 ( ) (random experiment). ( :element) (object). w X ( w) = x. 1.5. (discrete), (continuous) (finite), (infinite).,,,,.,. : (metric, measurable, quantitative) ( non-metric, classified, categorical, qualitative). (ordinal) (nominal). IQ,, (A, B, C, D, F).,. : (casual relationship) ( ) (exploratory variable) (independent) (dependent) (response).
Math. Statistics: Statistics? 8 Y, X.,. (object) (date). 1.5.3 (probability distribution function) (domain), (probability) (range) ( ) (, prob. density function) f (x) ( p(x) )..,,,. 1.5.4 (parameter) (statistics) f ( θ ). f. (parameter). x i ~ f ( x : θ ) iid ( x1, x,..., xn ) f ( x; θ ) : ~ (population) (sample) ( x 1, x,..., xn) θ ( ) µ, σ, p. (θˆ ) x, s, pˆ ( x1, x,..., xn). ( µ ), ( σ ) ( x x = i ) n ( x X ) s = i. (estimate) n 1
Math. Statistics: Statistics? 9 (estimator).. (????).. (population). (, ).. (parameter) (model). p θ = p. ( ). ( Corr( X, Y ) = 0? ) (census)., (sample). (subset). miniature. ( ), ( ).,. 00. 4 p ˆ = 4 / 00 = 0.1. (statistic).,,. miniature ( ) (estimate). 1%.. ˆ # of " yes" θ = pˆ =. n (point estimate). (interval estimate) ( θ, θ ). (distribution) (sampling distribution). ˆp pq ˆˆ ˆˆ., pˆ ~ app N( p, ). 100(1 α)% ( ˆ pq pq p z1 α /, pˆ + z1 α / ). n n n L U
Math. Statistics: Statistics? 10 95%? 0.95 100 95... 99% 95%?.?. 95% ( 5%).. (.).. ( : H0 : p = p0 ) pˆ p0 ~ app N(0,1). pq 0 0 / n... 1 (type 1 error, α, ) 1 (significant level) 5%. pˆ p0 pq 0 0 / n : ( ) (p-value). ( ),. two-sided(, ~, ) 1/. t. F χ.
Math. Statistics: Statistics? 11 1.5.5 ( ) (random variable) ( ). X, Y, Z.. EXAMPLE 004 30 ( ) 4 (IQ, ). : X, IQ 130, 114, x i ( x, x,..., xn 1 ) :Y, Gender,, x i ( x 1, x,..., xn ) ( )Y=1,, Y=0,., IQ. 1.5.6,, (probability density function). (domain, x ), ( f( x )) (range, y ).? ).( ) (Central Limit Theorem), t-. f (x) F(x
Math. Statistics: Statistics? 1 1.5.7 (sample) (random) ( ) ( ). (random sample). iid f ( x). iid : independently and identically distributed x i ~ (independently):. (identically): ( x i : ) f (x). (sampling distribution). ( x 1, x,..., xn ) f( x; θ )( θ ) µ () x µ t -. s / n. t - (,, ), χ - (, ), F - (, ). ( x 1, x,..., xn ) (sample distribution). -, - () (outlier). CLT σ Normal( µ, ) n