확률및분포 박창이 서울시립대학교통계학과 박창이 ( 서울시립대학교통계학과 ) 확률및분포 1 / 15
학습내용 조건부확률막대그래프히스토그램선그래프산점도참고 박창이 ( 서울시립대학교통계학과 ) 확률및분포 2 / 15
조건부확률 I 첫째가딸일때두아이모두딸일확률 (1/2) 과둘중의하나가딸일때둘다딸일확률 (1/3) 에대한모의실험 >>> from collections import Counter >>> import math, random >>> from matplotlib import pyplot as plt >>> def random_kid(): return random.choice(["boy", "girl"]) >>> both_girls = 0 >>> older_girl = 0 >>> either_girl = 0 >>> random.seed(0) >>> for _ in range(10000): younger = random_kid() older = random_kid() if older == "girl": older_girl += 1 박창이 ( 서울시립대학교통계학과 ) 확률및분포 3 / 15
조건부확률 II if older == "girl" and younger == "girl": both_girls += 1 if older == "girl" or younger == "girl": either_girl += 1 >>> print("p(both older):", both_girls / older_girl) # 0.514 ~ 1/2 P(both older): 0.5007089325501317 >>> print("p(both either): ", both_girls / either_girl) # 0.342 ~ 1/3 P(both either): 0.3311897106109325 박창이 ( 서울시립대학교통계학과 ) 확률및분포 4 / 15
확률분포 I 균일분포 >>> def uniform_pdf(x): return 1 if x >= 0 and x < 1 else 0 >>> def uniform_cdf(x): "returns the probability that a uniform random variable is less tha if x < 0: return 0 # uniform random is never less than 0 elif x < 1: return x # e.g. P(X < 0.4) = 0.4 else: return 1 # uniform random is always less than 1 박창이 ( 서울시립대학교통계학과 ) 확률및분포 5 / 15
확률분포 II 정규분포확률밀도함수 >>> def normal_pdf(x, mu=0, sigma=1): sqrt_two_pi = math.sqrt(2 * math.pi) return (math.exp(-(x-mu) ** 2 / 2 / sigma ** 2) / (sqrt_two_pi * sigma)) >>> xs = [x / 10.0 for x in range(-50, 50)] >>> plt.plot(xs,[normal_pdf(x,sigma=1) for x in xs], -, label= mu=0,sigma=1 ) >>> plt.plot(xs,[normal_pdf(x,sigma=2) for x in xs], --, label= mu=0,sigma=2 ) >>> plt.plot(xs,[normal_pdf(x,sigma=0.5) for x in xs], :, label= mu=0,sigma=0.5 ) >>> plt.plot(xs,[normal_pdf(x,mu=-1) for x in xs], -., label= mu=-1,sigma=1 ) >>> plt.legend(loc=4) # bottom right >>> plt.show() 박창이 ( 서울시립대학교통계학과 ) 확률및분포 6 / 15
확률분포 III 박창이 ( 서울시립대학교통계학과 ) 확률및분포 7 / 15
확률분포 IV 분포함수 >>> def normal_cdf(x, mu=0,sigma=1): return (1 + math.erf((x - mu) / math.sqrt(2) / sigma)) / 2 >>> xs = [x / 10.0 for x in range(-50, 50)] >>> plt.plot(xs,[normal_cdf(x,sigma=1) for x in xs], -, label= mu=0,sigma=1 ) >>> plt.plot(xs,[normal_cdf(x,sigma=2) for x in xs], --, label= mu=0,sigma=2 ) >>> plt.plot(xs,[normal_cdf(x,sigma=0.5) for x in xs], :, label= mu=0,sigma=0.5 ) >>> plt.plot(xs,[normal_cdf(x,mu=-1) for x in xs], -., label= mu=-1,sigma=1 ) >>> plt.legend(loc=4) # bottom right >>> plt.show() 박창이 ( 서울시립대학교통계학과 ) 확률및분포 8 / 15
확률분포 V 박창이 ( 서울시립대학교통계학과 ) 확률및분포 9 / 15
확률분포 VI 분위수함수 >>> def inverse_normal_cdf(p, mu=0, sigma=1, tolerance=0.00001): """find approximate inverse using binary search""" # if not standard, compute standard and rescale if mu!= 0 or sigma!= 1: return mu + sigma * inverse_normal_cdf(p, tolerance=tolerance) low_z, low_p = -10.0, 0 # normal_cdf(-10) ~ 0 hi_z, hi_p = 10.0, 1 # normal_cdf(10) ~ 1 while hi_z - low_z > tolerance: mid_z = (low_z + hi_z) / 2 # consider the midpoint mid_p = normal_cdf(mid_z) # and the cdf s value there if mid_p < p: # midpoint is still too low, search above it low_z, low_p = mid_z, mid_p 박창이 ( 서울시립대학교통계학과 ) 확률및분포 10 / 15
확률분포 VII elif mid_p > p: # midpoint is still too high, search below it hi_z, hi_p = mid_z, mid_p else: break return mid_z 박창이 ( 서울시립대학교통계학과 ) 확률및분포 11 / 15
중심극한정리 I >>> def bernoulli_trial(p): return 1 if random.random() < p else 0 >>> def binomial(p, n): return sum(bernoulli_trial(p) for _ in range(n)) >>> def make_hist(p, n, num_points): data = [binomial(p, n) for _ in range(num_points)] # use a bar chart to show the actual binomial samples histogram = Counter(data) plt.bar([x - 0.4 for x in histogram.keys()], [v / num_points for v in histogram.values()], 0.8, color= 0.75 ) 박창이 ( 서울시립대학교통계학과 ) 확률및분포 12 / 15
중심극한정리 II mu = p * n sigma = math.sqrt(n * p * (1 - p)) # use a line chart to show the normal approximation xs = range(min(data), max(data) + 1) ys = [normal_cdf(i + 0.5, mu, sigma) - normal_cdf(i - 0.5, mu, sigma) for i in xs] plt.plot(xs,ys) plt.show() >>> make_hist(0.75, 100, 10000) 박창이 ( 서울시립대학교통계학과 ) 확률및분포 13 / 15
중심극한정리 III 박창이 ( 서울시립대학교통계학과 ) 확률및분포 14 / 15
참고 scipy.stats: 여러가지확률분포의확률밀도함수및누적분포함수구현됨 박창이 ( 서울시립대학교통계학과 ) 확률및분포 15 / 15