(, sta*s*cal disclosure control) - (Risk) and (U*lity) (Synthe*c Data) 4. 5.

Similar documents
사회통계포럼

2015

현 안 분 석 2 Catsouphes & Smyer, 2006). 우리나라도 숙련된 인 력부족에 대한 우려가 심화되고 있으며, 일자리의 미 스매치 수준이 해외 주요국보다 심각하다는 점도 지 지부진한 유연근무제의 확산을 위한 진정성 있는 노 력이 필요하다는 점을 보여준다

제 1 부 연구 개요

¼±ÅÃÀû º¹¸®ÈÄ»ýÁ¦µµ.hwp



01.여경총(앞부분)

snu.pdf

199

b

'00 지역별분석.PDF

<FEFF E002D B E E FC816B CBDFC1B558B202E6559E830EB C28D9>

PBR200116_01.PDF

<31342EBCBAC7FDBFB52E687770>

PowerChute Personal Edition v3.1.0 에이전트 사용 설명서

105È£4fš

서론 34 2

F1-1(수정).ppt

13.11 ①초점

歯연보00-5.PDF

第 1 節 組 織 11 第 1 章 檢 察 의 組 織 人 事 制 度 등 第 1 項 大 檢 察 廳 第 1 節 組 대검찰청은 대법원에 대응하여 수도인 서울에 위치 한다(검찰청법 제2조,제3조,대검찰청의 위치와 각급 검찰청의명칭및위치에관한규정 제2조). 대검찰청에 검찰총장,대

김기남_ATDC2016_160620_[키노트].key

쿠폰형_상품소개서

PowerPoint 프레젠테이션

THE E-LAND Group C S R A n n u a l R e p o r t www elandcsr or kr CSR CSR A 2 TEL FAX


Vol.257 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

歯4차학술대회원고(황수경이상호).PDF

제 31회 전국 고교생 문예백일장 산문 부문 심사평.hwp

1~10


< FC1A4BAB8B9FDC7D D325FC3D6C1BEBABB2E687770>

제 1 부 연구 개요

2014밝고고운동요부르기-수정3

2005프로그램표지

<C1B6BBE7BFACB1B D303428B1E8BEF0BEC B8F1C2F7292E687770>

1 - OZ Viewer / 상권분석

텀블러514

QYQABILIGOUI.hwp

1 - OZ Viewer / 상권분석

전립선암발생률추정과관련요인분석 : The Korean Cancer Prevention Study-II (KCPS-II)

., (, 2000;, 1993;,,, 1994), () 65, 4 51, (,, ). 33, 4 30, 23 3 (, ) () () 25, (),,,, (,,, 2015b). 1 5,

2002.9월작업.doc

......(N)

목차 1. Investment Summary p2 2. 일본에서 강화되는 팬덤 p4 3. 지속적인 아티스트 배출 능력 p9 4. 국내외 콘텐츠 수요 확대 p11 5. Valuation p13 Fundamentals 동사의 아티스트 발굴 시스템은 지속적인 포트폴리오 강화

KISA-RP hwp

歯CRM개괄_허순영.PDF

사회동향1-2장

<5B4E DBDBAB8B6C6AEC4DCC5D9C3F720BFF9B0A3B5BFC7E2BAB8B0ED5F35C8A32838BFF9292E687770>

SW¹é¼Ł-³¯°³Æ÷ÇÔÇ¥Áö2013

사운드네트워크(주)_소개서 KEYNOTE( )

2009½Å¿ëÆò°¡-³»Áö0309

½ÅÇÑsr_±¹¹®Æîħ

½ÅÇÑsr_±¹¹®Æîħ

국내 디지털콘텐츠산업의 Global화 전략

review hwp

<B0ADB9AE5F33B1C75F30315FC0CEBCE232C2F728B9DAC1D6BEF0295F FB3AAB4AEC5EBB0E85FB0B3BCB1B9E6BEC85FC3D6C1BEBAB8B0EDBCAD28C3D6C1BE295FC3D6C1BE E687770>


건강가득 지식in 트루라이프는 국내 배합사료의 효시이자 양계사료 브랜드파워 1위인 40년 전통 서부사료(주)를 전신으로 2006년 10월에 출발한 (주)트루라이프는 농장경영컨설팅, 건강컨설팅, 해외컨설팅 사업을 활발히 전개하고 있으며 이를 통해 초일류식품종합그룹을 달성

untitled

<BFACC3D15F F31375FB5B5B7CEBBE7BEF7C0C7B1B3C5EBBCF6BFE4C3DFC1A4BFC0C2F7B9DFBBFDBFF8C0CEB9D7BFB5C7E2BAD0BCAE5FC1A4BCBABAC0C0E5BCF6C0BA2E687770>

0121사회동향1장

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

hwp

2009년 국제법평론회 동계학술대회 일정

< FB4EBB1B8BDC320BAB8B0C7BAB9C1F6C5EBB0E8BFACBAB820B9DFB0A320BFACB1B85FBEF6B1E2BAB92E687770>

歯안주엽홍서연원고.PDF

제 출 문 한국산업안전공단 이사장 귀하 본 보고서를 2002 년도 공단 연구사업계획에 따라 수행한 산 업안전보건연구수요조사- 산업안전보건연구의 우선순위설정 과제의 최종보고서로 제출합니다. 2003년 5월 연구기관 : 산업안전보건연구원 안전경영정책연구실 정책조사연구팀 연

<443A5CB1E8BFF8BAD05C B3E2B0E6C1A6C6F7C4BFBDBA5C C E2E2E>

CONTENTS SUMMARY PART 1 MARKET MARKET STRATEGY MARKET ISSUE MARKET ISSUE PART 2 CREDIT CREDIT ISSUE CREDIT ISSUE CREDIT ISSUE CREDIT ISSUE CREDIT STRA

저작자표시 - 비영리 - 변경금지 2.0 대한민국 이용자는아래의조건을따르는경우에한하여자유롭게 이저작물을복제, 배포, 전송, 전시, 공연및방송할수있습니다. 다음과같은조건을따라야합니다 : 저작자표시. 귀하는원저작자를표시하여야합니다. 비영리. 귀하는이저작물을영리목적으로이용할

연간전망_통신 1215

0121사회동향1장


목차.PDF

Microsoft Word - 산업_홈쇼핑_

비식별화 기술 활용 안내서-최종수정.indd

장기계획-내지4차

untitled

歯4차학술대회원고(장지연).PDF

歯경제.PDF

PBR PDF

ETL_project_best_practice1.ppt

사회동향1-최종

PRO1_04E [읽기 전용]

<49545F3134C8A35FB1B9C1A4BFEEBFB520BCB1C1F8C8ADB8A620C0A7C7D120BCD2BCC8B9CCB5F0BEEE20BAD0BCAE20B1E2B9DDC0C720B1B9B9CEB0F8B0A85FBCF6C1A42E687770>

sc family_12 내지_최종

농심-내지

Oracle Apps Day_SEM

사회동향1-2장


歯박사

Vol. 20, December 2014 Tobacco Control Issue Report Contents Infographic 년 전 세계 FCTC 주요 이행현황 Updates 04 이 달의 정책 06 이 달의 연구 Highlights 09 담배규제기본

AIAA (I).hwp

ASETAOOOCRKG.hwp

Manufacturing6

untitled

methods.hwp

특건확대공청회자료[1].PDF

Transcription:

1 (, ), ( )

2 1. 2. (, sta*s*cal disclosure control) - (Risk) and (U*lity) - - 3. (Synthe*c Data) 4. 5.

3 1.

+ 4 1. 2.,. 3. K

+ [ ] 5 ' ', " ", " ". (SNS), '. K KT,, KG (PG), 'CSS'(Credit Scoring System)....,,,.

+ 6? 1. ( ): 2. ( ) ( ). ( 2 2 ) 2. : 1. " ", ( ).( 2 1 ) 3. : 6. " " ( ). ( 2 6 )

+ k- 7 (2016.06.09) [ ]

+ 8

9 / / :. (2013) 2,.

10 ( DB)/ KCB- : DB

3 KCB 1 ( 31 ) KCB KCB DB * DB ( ) ( ) [ : ] 11

12 / - KCB DB/.

3. K- 13

+ 1: (disclosure Risk) 14 k-, l-, t- ( )

+ k-, l-, t- : 15 [ (2016.06.09) [ ],.]

+ : k- l- 16 Key 1 Key 2 k-, l- 1 1 1 50 3 2 2 1 1 50 3 2 3 1 1 42 3 2 4 1 2 42 1 1 5 2 2 62 2 1 6 2 2 62 2 1

+ (general) 17 : : (intruder) k- : =

+ 2: 18

1. (recording):. 5. / (top- down coding). 100 1000 (top coding) 2. (local suppression): -. 3. (micro- aggrega*on): k.. 19

+ ( ) k- 20 K-.. On k-anonymity and the Curse of Dimensionality VLDB 2005, By Charu C. Aggarwal [G-citation number 572]

+ 1: k,, 21 : German Credit Data : 20, : 1000 : Rondom Forest X1 : account balance (4 levels), X2 : credit history (5 levels) X3 : Purpose (10 levels), X4 : Savings account (5 levels) R sdcmicro local suppression k - ( ) recording k-

+ k=2, key =X1~X5: 5 k- obs.no X1 X2 X3 X4 1 <0DM critical account NA no savings account 2 0<=...<200DM credits paid back till now radio/television <100DM 3 no account critical account NA <100DM 4 <0DM credits paid back till now furniture/equipment <100DM 5 <0DM delay in paying off NA <100DM 6 no account credits paid back till now NA no savings account 7 no account credits paid back till now NA 500<=...<1000 DM 8 0<=...<200DM credits paid back till now used car <100DM 9 no account credits paid back till now NA >=1000DM 10 0<=...<200DM critical account new car <100DM 11 0<=...<200DM credits paid back till now new car <100DM 12 <0DM credits paid back till now business <100DM 13 0<=...<200DM credits paid back till now radio/television <100DM 14 <0DM critical account new car <100DM 15 <0DM credits paid back till now new car <100DM 16 <0DM credits paid back till now NA 100<=...<500DM 17 no account critical account radio/television no savings account 18 <0DM no credit taken business NA 19 0<=...<200DM credits paid back till now used car <100DM 20 no account credits paid back till now radio/television 500<=...<1000 DM 22

+ k=5, key =X1~X20: 20 k- 23 obs.no X1 X2 X3 X4 1 <0DM critical account radio/television no savings account 2 0<=...<200DM credits paid back till now NA <100DM 3 NA critical account NA <100DM 4 <0DM credits paid back till now furniture/equipment <100DM 5 <0DM NA NA <100DM 6 no account credits paid back till now NA no savings account 7 no account credits paid back till now NA 500<=...<1000 DM 8 NA credits paid back till now used car <100DM 9 no account credits paid back till now radio/television NA 10 NA critical account NA <100DM 11 0<=...<200DM credits paid back till now new car NA 12 <0DM credits paid back till now NA <100DM 13 0<=...<200DM credits paid back till now radio/television <100DM 14 <0DM critical account NA NA 15 <0DM credits paid back till now new car <100DM 16 <0DM credits paid back till now NA NA 17 no account critical account radio/television no savings account 18 NA NA NA NA 19 0<=...<200DM credits paid back till now NA <100DM 20 no account credits paid back till now radio/television 500<=...<1000 DM

+ Summary on missing rates: k- 24 Key K X1 X2 (%) X1-X5 k=2 1 33 177 0.9 k=3 3 67 273 1.5 k=5 3 125 373 2.4 X1-X10 k=2 102 151 754 5.7 k=3 154 239 905 8.1 k=5 232 349 978 11.3 X1-X15 k=2 154 157 825 9.0 k=3 222 224 959 12.4 k=5 333 325 1000 16.8 X1-X20 k=2 183 132 935 15.6 k=3 222 209 1000 21.6 k=5 310 279 1000 27.2

+ 2: k-, (microaggregation) 25 R sdcmicro microaggregation : German credit data (3 ) simulated data - X1, X2, X3 : German credit data (X1 : duration, X2: credit amount, X3 : Age) - X4, X5, X20 [1,10] uniform : 1000

k=10, key = X1~X20 obs. No Du CA Age 1 6.0 1,169.0 67.0 2 48.0 5,951.0 22.0 3 12.0 2,096.0 49.0 4 42.0 7,882.0 45.0 5 24.0 4,870.0 53.0 6 36.0 9,055.0 35.0 7 24.0 2,835.0 53.0 8 36.0 6,948.0 35.0 9 12.0 3,059.0 61.0 10 30.0 5,234.0 28.0 11 12.0 1,295.0 25.0 12 48.0 4,308.0 24.0 13 12.0 1,567.0 22.0 14 24.0 1,199.0 60.0 15 15.0 1,403.0 28.0 16 24.0 1,282.0 32.0 17 24.0 2,424.0 53.0 18 30.0 8,072.0 25.0 19 24.0 12,579.0 44.0 20 24.0 3,430.0 31.0 10- obs. No Du CA Age 1 16.4 1,703.4 50.0 2 47.3 7,027.0 30.5 3 15.7 2,314.4 39.1 4 37.2 6,317.5 32.7 5 39.0 4,185.2 47.6 6 15.7 2,314.4 39.1 7 17.2 1,754.2 42.3 8 27.0 4,561.5 26.1 9 13.0 2,293.3 59.3 10 24.3 5,059.7 39.1 11 14.1 1,821.5 31.4 12 48.6 5,711.0 29.5 13 14.7 2,174.4 28.6 14 24.0 2,931.5 47.0 15 16.8 2,169.0 32.2 16 19.4 3,346.7 30.2 17 23.4 4,554.7 31.7 18 43.2 10,447.9 30.1 19 19.2 9,100.5 50.8 20 16.6 2,512.8 32.5

10- obs.no Du CA Age 3 12.0 2,096.0 49.0 6 36.0 9,055.0 35.0 148 12.0 682.0 51.0 306 6.0 1,543.0 33.0 415 24.0 1,381.0 35.0 552 6.0 1,750.0 45.0 601 7.0 2,329.0 45.0 622 18.0 1,530.0 32.0 639 12.0 1,493.0 34.0 756 24.0 1,285.0 32.0 15.7 2,314.4 39.1 obs.no Du CA Age 3 15.7 2,314.4 39.1 6 15.7 2,314.4 39.1 148 15.7 2,314.4 39.1 306 15.7 2,314.4 39.1 415 15.7 2,314.4 39.1 552 15.7 2,314.4 39.1 601 15.7 2,314.4 39.1 622 15.7 2,314.4 39.1 639 15.7 2,314.4 39.1 756 15.7 2,314.4 39.1

( ) k=10 169, 726, 916

30 4. (synthetic data)

31 Synthetic Data (synthe*c data, ), - (par*ally synthe*c data) - (fully synthe*c data)

(synthe*c): =, =.[www.imbc.com ] 32

-, key- synthe*ze - 33

34 Synthetic Data : 1) (re- iden*fica*on).. 2) [ ] 3).

:. (. ) -., Synthe*c data. 35

36 Synthetic data 1. 2. SBB(SIPP Synthe*c Beta): Federal Privacy Council Census of Bureau Census of Bureau Survey of Income and Program Par8cipa8on(SIPP) Social Security Administra8on(SSA)/Internal Revenue Service(IRS)

: hxps://www.census.gov/programs- surveys/sipp/ 3. Census of Bureau, / 37

Synthe*c SIPP data 1. Applica*on form,, SIPP (123 ) 2. 5 3. account SAS Stata SSB data 4. (SAS Stata code) 38

39 Synthetic Data : German Credit Data Data : German credit data( =1000) 900 : training data 100 : test data Synthetic variable y (credit status : good, bad) 3 (duration, credit amount, age) Models for synthesis : f(y ): logistic regression : f(c1 ), f(c2 ), f(c3 ): linear regression

Synthe*c data 1. 900 training synthesis 2. training synthetic data set 3. Synthetic data set y (logistic regression) 4. 3 model original training data Testing (100 ) 40

Synthe*c data : Original Training Data obs.no y duration credit.amount age account balance Credit history purpose 1 good 6 1,169 67 <0 DM critical account radio/television 2 bad 48 5,951 22 0<=...<200 DM credits paid back till now radio/television 3 good 12 2,096 49 no account critical account education 4 good 42 7,882 45 <0 DM credits paid back till now furniture/equipment 6 good 36 9,055 35 no account credits paid back till now education 7 good 24 2,835 53 no account credits paid back till now furniture/equipment 8 good 36 6,948 35 0<=...<200 DM credits paid back till now used car 9 good 12 3,059 61 no account credits paid back till now radio/television 10 bad 30 5,234 28 0<=...<200 DM critical account new car 41

Synthe*c Data obs.n o 1 y goo d duratio n credit.amou age nt account balance Credit history purpose 33 2,672 43 <0 DM critical account radio/television 2 bad 32 4,751 39 3 4 6 7 8 9 goo d goo d goo d goo d goo d goo d 0<=...<200 DM credits paid back till now radio/television 10 839 46 no account critical account education 41 4,753 23 <0 DM 24 8,606 31 no account 9 2,957 46 no account 29 5,420 42 0<=...<200 DM 32 3,685 41 no account 10 bad 19 3,407 33 0<=...<200 DM credits paid back till furniture/ now equipment credits paid back till education now credits paid back till furniture/ now equipment credits paid back till used car now credits paid back till radio/television now critical account new car 42

Synthe*c data Synthetic 100 set, ( ) (confusion matrix) original data synthetic data model true true true bad true bad good good pred. bad 56 15 55.87 14.34 (0.418) (0.497) pred. good 13 16 13.13 16.66 (0.418) (0.497) model original data synthetic data 0.28 0.275 43

44 K- Synthetic Data K- 1. 2. 3.

45 5. (Differential Privacy)

+ (Differential Privacy) 46 1. 2. : 라플라스기계 (Laplance machine) 3. Local differential privacy 4. 5.

+ 47 : 1. ( ) (Q) 2. R

+ 48

A,, ( ). : NOISE NOISE. 49

+ 1: 50

: 2: : 51

52

Post- processing Invariance: ε-dp Algorithm Algorithm Output ε- DP [ K. Chaudhuri A.D. Sarwate ] 53

Composi*on: A 1 (, ) ε 1 - DP A 2 (, ) ε 2 - DP A 1 (D) A 2 (D) [ K. Chaudhuri A.D. Sarwate ] 54

Dependency to :.,, (classifier),. 55

+ Local Differential Privacy 56 DP( ) central. Epsilon noise. Local DP central noise central.

+ 3: randomized response 57 Randomized response: : / O/X. O. Google Chrome : /.

+ 4: NYC taxicab data set 58 Riding with the stars, from research.neustar.biz

+ 59 2013 NY taxicab dataset : pickup and drop off times, locations, fare and tip amounts de-anonymize, (Driver privacy) privacy (Passenger privacy in the NYC taxicab dataset). : the picture, some information from celebrity gossip blogs

+ 60 Jessica Alba,, ($9), ($0)

+ Differentially Privatized Trip (drop-off) 61 Total Fare: $25 - $30 Tip Amount: $6 - $8

+ NYC taxicab data: local DP 62

+ K- 63 Differentially Privatized Synthetic Data: 1. DB Local DP ε-ldp /

2. LDP. / / ε-ldp / ε-ldp / ε-ldp / 3. DP/local DP procedure 64

65 6.

66 1.. 2. - - / 3... K- 4.,

67-5. / - /

Thank you 68