Identification of genetic basis for Type 2 Diabetes in East Asian populations Yoon Shin Cho Hallym University Department of Biomedical Science
Injury Infectious diseases General cancers Atherosclerosis Hypertension Diabetes Depression Schizophrenia Hereditary cancers Genetic diseases Environmental factors Genetic factors Risk factors for complex diseases and traits (correlation between genetic and environmental factors) 100 % 100 % 0% 0%
Ways to identify genetic factors for diseases Disease Genomics : Study of human diseases based on genetic information in the human genome Number of disease determinants by effect (From Peltonen and McKusick 2001)
DNA genetic variation Insertions/Deletions, Translocation Copy number variation (CNV) Single Nucleotide Polymorphism (SNP) - 0.1% difference btw individuals - No. of SNP = ~23,650,000 (based on dbsnp build 131, 2011.5)
Genetic variations in homologous chromosomes Genomic loci ( SNPs) Homologous chromosomes Risk allele P a B P a b Protective allele Genotype: PP Homozygous for the risk allele aa Homozygous for the protective allele Bb Heterozygous
Testing for disease-marker (SNP) association
Genome Wide Association Study (GWAS) Marker 1 Marker 2 Marker 3 Marker 4 Marker 5.. Marker 499,999 Marker 999,999 Marker 500,000 Marker 1,000,000 7 Association analysis between genetic markers (SNPs) across the entire genome and phenotypes in a large number of samples
Overview of the general design and workflow of GWAS (Kingsmore et al., Nature Reviews Drug Discovery. 2008)
GWAS publications (by Dec 30, 2010) Genome Wide Association Study (GWAS) Year Modified data from HuGE Navigator
Advances for GWA analysis Advances in genotyping technology Better understanding of patterns of human sequence variation GWAS Sample collections of adequate size
Why a Genetic Study of T2D? T2D strongly familial T2D huge, growing public health problem worldwide - Risk factors : renal failure, retinopathy, peripheral neuropathy, cardiovascular diseases - Death by diabetes : 22.9/100,000 in Korea in 2007 (5th ranked) - Prevalence rate growing rapidly Korea US 10.7% 10 8.6% 9.2% 9.5% 10 8 8 6 6 4 4 2 2 0 2001 1 2005 2 2007 3 0 8.7% 2002 1 2007 2 Age above 30 years old & FPG>126mg/dL or diabetes medication treatment (2008 Korea National Health and Nutrition Examination Survey)
Pathobiology of T2D Glucose-stimulated secretion of insulin Defect in Insulin secretion in pancreatic beta cells Insulin-mediated glucose uptake (adipocyte/skeletal muscles) Insulin resistance in adipocytes & skeletal muscle cells
Efforts to identify susceptibility loci for T2D 2000 PPARG 2003 KCNJ11 2006 TCF7L2 2007 FTO SLC30A8 HHEX/IDE WFS1 CDKAL1 IGF2BP2 9p21 HNF1B MTNR1B KCNQ1 TSPAN8 THADA ADAMTS9 NOTCH2 CAMK1D JAZF1 TP53INP1 KLF14 ZBED3 HMGA2 BCL11A CHCHD9 HNF1A IRS1 DGKB GCKR GCK CENTD2 ADCY5 PROX1 DUSP8 SRR DUSP9 ZFAND6 PRC1 PTPRD UBE2E2 C2CD4A-C2CD4B SPRY2 CDC123/CAMK1D GRB14 ST6GAL1 VPS26A HMG20A AP3S2 HNF4A 2008 2009 2010 2011 So far, 49 T2D loci have been identified by candidate gene studies, GWAS for T2D, GWAS of related traits, GWA meta-analysis for T2D
Physiological mechanisms implied from European T2D loci (Adapted 한림대학교 from McCarthy) 바이오메디컬학과
Why GWAS in Korean (or Asian) population? 대부분의 전장유전체연관분석연구는 유럽인을 대상으로 이루어져 왔음 특정 유전적 변이는 다른 인구집단에 따라 특정 질환 혹은 형질에 상이한 영향을 미칠 수 있음 1. 2. 3. 4. Allele 빈도의 차이 LD 구조의 차이 환경적 차이 (예: GGI, GEI, selection, drift) 각 집단에서 새롭게 생겨난 돌연변이
KARE Project The 1st Korea GWAS of population-based cohorts Objectives of KARE (Korea Association REsoure) project 1. Genome-wide association study (GWAS) 수행을 통한 - Quantitative trait(qt)들에 영향을 미치는 유전요인 발굴 - 생활습관질환(예, 제2형 당뇨)에 영향을 미치는 유전요인 발굴 2. 국내 유전체 연구의 기반확립 - 한국인 10,000 여명에 대한 500,000 SNP 정보 확보 - 유전체 정보의 DB화 및 국내연구자들에게 유전체 정보제공
KARE phenotypes Phenotypes for KARE study - T2D - Hypertension Diseases - Osteoporosis (bone mineral density) - Obesity - Metabolic syndrome - Dyslipidemia - T2D related traits Health related quantitative traits Fasting plasma glucose/insulin, Glucose/insulin OGTT 60, Glucose/insulin OGTT 120, HBA1C, HOMAB, HOMA_IR - Plasma lipids LDLC, HDLC, TG, TCHL - Blood pressure SBP, DBP - Liver enzymes γ GTP, AST, ALT - Kidney function related traits Creatinine, egfr, Plasma_albumin, BUN - Obesity related traits Weight, Waist, BMI, WHR, Height - Pulse rate Pulse rate - Hematological traits Hemoglobin, Hematocrit, Red blood cell count, White blood cell count
Korea GWAS for Type 2 Diabetes (KARE study) Korean Genome Epidemiology Study (KoGES) (10,038 subjects) Genotyping w/ Affymetrix 5.0 (500,568 SNPs) Data Filtering (8,842 subjects/352,228 SNPs) Association Analyses (T2D case-control) (1,042/2,943) Identification of significant SNPs Replication study from independent population (Health2) (1,216/1,352)
Results of GWAS for T2D in KARE study SNP ID CHR Gene MAF OR (95% CI) All Korean (stage 1 + stage 2) (2258/4295) Stage 2 (replication) (1216/1352) Stage 1 (KARE-GWAS) (1042/2943) P-value MAF OR (95% CI) P-value OR (95% CI) P-value SNPs showing the strong evidence of association rs7754840 6 CDKAL1 0.53/0.46 1.30 (1.17-1.45) 1.06E-06 0.51/0.46 1.22 (1.08-1.38) 1.68E-03 1.26 (1.17-1.37) 5.04E-09 rs10811661 9 CDKN2A/B 0.40/0.45 0.79 (0.71-0.88) 1.41E-05 0.39/0.45 0.78 (0.68-0.89) 2.24E-04 0.79 (0.72-0.86) 2.07E-08 rs1106xxxx 12 Gene 1 0.15/0.19 0.70 (0.61-0.81) 1.96E-06 0.16/0.19 0.74 (0.62-0.88) 4.73E-04 0.72 (0.64-0.80) 6.68E-09 rs207xxxx 12 Gene 2 0.12/0.16 0.66 (0.57-0.78) 3.81E-07 0.12/0.16 0.73 (0.61-0.88) 1.08E-03 0.69 (0.61-0.78) 3.04E-09 SNPs showing the moderate evidence of association rs4376068 3 IGF2BP2 0.32/0.28 1.26 (1.12-1.41) 7.42E-05 0.30/0.26 1.20 (1.04-1.38) 1.36E-02 1.23 (1.13-1.35) 2.47E-06 rs3821964 4 BMPR1B 0.45/0.50 0.80 (0.72-0.89) 4.19E-05 0.47/0.49 0.87 (0.76-0.99) 2.96E-02 0.83 (0.76-0.90) 8.41E-06 rs6882351 5 5p13.1b 0.33/0.39 0.78 (0.69-0.87) 1.31E-05 0.36/0.39 0.85 (0.75-0.97) 1.63E-02 0.81 (0.74-0.88) 1.30E-06 rs10258075 7 INSIG1 0.16/0.12 1.39 (1.19-1.61) 2.30E-05 0.13/0.11 1.22 (1.00-1.49) 4.72E-02 1.32 (1.17-1.49) 4.92E-06 rs2868088 20 HNF4A 0.41/0.46 0.80 (0.72-0.89) 3.55E-05 0.42/0.44 0.87 (0.76-0.99) 4.14E-02 0.82 (0.76-0.90) 7.39E-06
E Asian T2D GWA meta-analysis to identify more T2D loci Stage Representative Study Ethnic group Stage 1 KNIH NUS Korean Chinese Chinese Malay Japanese Chinese Chinese Philipino Stage 2 Stage 3 Overall KARE SP1 SP2 SiMES IMCJ Shanghai Taiwan CLHNS IMCJ Vanderbilt U Taiwan UNC total RIKEN/Tokyo U BBJ KNIH Health2 SJTU Shanghai total IMCJ IMCJ SJTU Shanghai CUHK CUHK NTUH NTUH SNU SNU total AGEN AGEN-T2D Japanese Korean Chinese Japanese Chinese Chinese Chinese Korean East Asian case 1042 1082 928 794 931 1019 997 159 6952 4885 1183 190 6258 5253 3410 1500 1500 600 12263 25473 Sample size control 2943 1006 939 1240 1404 1710 999 1624 11865 3779 1305 198 5282 6160 3412 1500 1500 700 13272 30419 total 3985 2088 1867 2034 2335 2729 1996 1783 18817 8664 2488 388 11540 11413 6822 3000 3000 1300 25535 55892
Stage1 : Discovery GWA meta-analysis combining 8 T2D GWA studies (6,952 cases vs. 11,865 controls) P < 5 X 10-4 Stage2 : in silico replication Validation of 4,014 SNPs selected from Stage1 (303 lead SNPs + their proxy SNPs) in 3 T2D GWA studies (6,258 cases vs. 5,282 controls) Combined meta-analysis (Stages 1+2) P < 10-5 Stage3 : de novo replication Validation of 19 SNPs selected from Stage2 in 5 T2D studies (12,263 cases vs. 13,272 controls) Combined meta-analysis (Stages 1+2+3) P < 5 X 10-8 8 novel T2D SNPs
Principal Component Analysis (PCA) in all individuals from each stage 1 component study and 270 individuals from HapMap DATA (A) SDCS/SP2 (1), Chinese (E) SDGS, Chinese (B) SDCS/SP2 (2), Chinese (F) TDS, Taiwanese (C) SiMES, Malays (D) KARE, Korean (G) CAGE, Japanese (H) CLHNS, Filipino
Summary of GWA meta-analysis for T2D Combined (stage 1+2+3) Candiate gene Risk Other allele allele RAF (HapMap JPT/CHB) OR (CI) P-value up to 25,079 cases and 29,611 controls DIAGRAM+ RAF (HapMap CEU) P-value power OR (CI) up to 8,130 cases and 38,987 controls Loci showing strong evidence of association with T2D MAEA c g 0.58 1.13 (1.10-1.16) 1.57E-20 NA NA NA NA GLIS3 a g 0.41 1.10 (1.07-1.13) 1.99E-14 0.54 1.04 (1.00-1.08) 6.43E-02 0.62 HNF4A g t 0.48 1.09 (1.07-1.12) 1.12E-11 0.18 1.07 (1.01-1.13) 1.47E-02 0.86 GCC1 g a 0.79 1.11 (1.07-1.14) 4.96E-11 0.56 0.99 (0.95-1.03) 4.89E-01 0.09 PSMD6 c t 0.61 1.09 (1.06-1.12) 8.41E-11 0.76 1.02 (0.97-1.07) 4.45E-01 0.16 ZFAND3 c t 0.27 1.12 (1.08-1.16) 2.06E-10 0.14 0.97 (0.90-1.04) 4.00E-01 0.23 PEPD a g 0.56 1.10 (1.07-1.14) 1.30E-08 0.6 1.02 (0.98-1.06) 3.61E-01 0.2 KCNK16 t g 0.42 1.08 (1.05-1.11) 2.30E-08 0.47 NA NA NA Loci showing moderate evidence of association with T2D CMIP c t 0.8 1.08 (1.05-1.12) 2.84E-07 0.99 1.20 (1.01-1.42) 3.33E-02 0.52 WWOX t c 0.32 1.08 (1.05-1.12) 9.49E-07 0.02 1.20 (0.95-1.52) 1.22E-01 0.87 The power was estimated given the 8,130 cases/38,987 controls, DIAGRAM+ ORs, T2D prevalency of 10% and RAF in HapMap (CEU) for α=0.05.
Pathway enrichments GRAIL Pubmed abstract mining Some connectivity but no clear hits to latent mechanisms
Physiological mechanisms of E Asian T2D loci 0.14 PPARG sensitive HOMA-IR index resistance sensitive HOMA-IR index resistance 0.12 Insulin resistance Insulin resistance 0.1 0.08 0.06 WFS1GCKR IGF2BP2 0.04 PSMD6 CDC123/CAMK1D KCNQ1 PROX1 C2CD4A-C2CD4B TCF7L2 ZFAND3 VPS26A KLF14 CHCHD9 IRS1 IRS1CMIPSRR ADAMTS9 TP53INP1 SPRY2 MTNR1B UBE2E2 FTO GCC1-PAX4 SLC30A8 AP3S2 HMGA2 TSPAN8/LGR5 CDKN2A/B KCNK16 CDKAL1 RBMS1 INS/IGF2 DGKB JAZF1 BCL11A KCNQ1 HMG20A GCK 0.02 0-0.02 MAEA -0.04 NOTCH2-0.06 Beta-cell dysfunction -0.08-0.1-0.08-0.06-0.04 Worse -0.02 0 0.02 0.04 HOMA-B index of beta-cell function 0.06 0.08 0.1 0.12 better
GWA meta-analysis for T2D in E Asian populations Cho et al. Nature Genetics, 2012
Limitation in GWAS Estimation of heritability and number of loci for several complex traits (Lander, Nature 2011)
Atlas of susceptibility Effect High rare examples of high-penetrance common variants influencing common disease rare alleles causing Mendelian disease Intermediate Modest Low-frequency variants with intermediate penetrance most common variants implicated in common disease by GWA rare variants of small effect very hard to identify by genetic means Low Ver y rare 0.001 Rare 0.01 Uncommon 0.1 Allele frequency Common (Adapted from McCarthy)
Thus, future studies to explain missing heritability 1. GWA meta-analysis & ethnic specific GWAS - More common variants - ethnic specific variants 2. Fine mapping of candidate T2D loci or Exome sequencing - rare variants - causal variants 3. Structural variants - CNVs - indels 4. Others - GXG interaction, GXE interaction - epigenetic modifications
The path for disease genomics Method 2005 Candidate Gene Approach for Association Analysis 2007 Genome Wide Association Study (GWAS) 2011 Genome-Wide Association Meta-Analysis (GWA MA) Targeted Resequencing for Disease Loci 2012 Exome Sequencing future Whole Genome Sequencing Technology Main Purpose Genotyping - identification of disease associated loci (1~1000 SNPs) - identification of causal variation for disease High throughput genotyping - identification of disease associated loci (> 500K SNPs) - identification of causal variation for disease (rarely) Imputation/meta-analysis - identification of disease associated loci (> 1.5~3 M imputed/ genotyped SNPs) - identification of causal variation for disease (rarely) NGS (Next Generation Sequencing) - identification of causal variation for disease NGS - identification of causal variation for disease NGS - identification of causal variation for disease 8
Acknowledgements KNIH Ajou Univ Korea Univ Young Jin Kim Nam H. Cho Chol Shin Min Jin Go Ji Hee Oh Jong-Young Lee Shanghai Jiao Tong U WTCCC Bok-Ghee Han Cheng Hu Mark McCarthy Weiping Jia Andrew Morris IMCJ RIKEN Univ of Tokyo Norihiro Kato Shiro Maeda Takashi Kadowaki Naoyuki Kamatani All AGEN members
Thank you!