Microsoft PowerPoint - bioinfo_09lect12_shpark_microarray.ppt [호환 모드]

Similar documents
Can032.hwp

(Exposure) Exposure (Exposure Assesment) EMF Unknown to mechanism Health Effect (Effect) Unknown to mechanism Behavior pattern (Micro- Environment) Re


Crt114( ).hwp


BSC Discussion 1

Microarray 기초 및 응용

Microsoft PowerPoint - ch03ysk2012.ppt [호환 모드]

#Ȳ¿ë¼®

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

232 도시행정학보 제25집 제4호 I. 서 론 1. 연구의 배경 및 목적 사회가 다원화될수록 다양성과 복합성의 요소는 증가하게 된다. 도시의 발달은 사회의 다원 화와 밀접하게 관련되어 있기 때문에 현대화된 도시는 경제, 사회, 정치 등이 복합적으로 연 계되어 있어 특

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

Gray level 변환 및 Arithmetic 연산을 사용한 영상 개선

6자료집최종(6.8))

methods.hwp

歯1.PDF

300 구보학보 12집. 1),,.,,, TV,,.,,,,,,..,...,....,... (recall). 2) 1) 양웅, 김충현, 김태원, 광고표현 수사법에 따른 이해와 선호 효과: 브랜드 인지도와 의미고정의 영향을 중심으로, 광고학연구 18권 2호, 2007 여름

<B3EDB9AEC1FD5F3235C1FD2E687770>

ÀÌÁÖÈñ.hwp

<3136C1FD31C8A320C5EBC7D52E687770>

Vertical Probe Card Technology Pin Technology 1) Probe Pin Testable Pitch:03 (Matrix) Minimum Pin Length:2.67 High Speed Test Application:Test Socket

Chapter 26


김기남_ATDC2016_160620_[키노트].key

Buy one get one with discount promotional strategy

975_983 특집-한규철, 정원호

< C6AFC1FD28B1C7C7F5C1DF292E687770>

- 2 -

untitled

09È«¼®¿µ 5~152s

Vol.259 C O N T E N T S M O N T H L Y P U B L I C F I N A N C E F O R U M

step 1-1

해당하는 논문이 있었다. 즉 이런 분류 방식이 중복출판 분류에 충분히 적용 가능함을 알 수 있었다. 또한 과거 분류한 것보다 조금 더 자세히 나누어서 어디에 해당하는지 쉽게 찾을 수 있는 방안이다. 사례를 보고 찾는다면 더욱 쉽게 해당하는 범주를 찾을 수 있을 것이다.

광덕산 레이더 자료를 이용한 강원중북부 내륙지방의 강수특성 연구

LIDAR와 영상 Data Fusion에 의한 건물 자동추출

대한한의학원전학회지26권4호-교정본(1125).hwp

한국전지학회 춘계학술대회 Contents 기조강연 LI GU 06 초강연 김동욱 09 안재평 10 정창훈 11 이규태 12 문준영 13 한병찬 14 최원창 15 박철호 16 안동준 17 최남순 18 김일태 19 포스터 강준섭 23 윤영준 24 도수정 25 강준희 26

09이훈열ok(163-

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jun.; 27(6),

슬라이드 제목 없음

서론 34 2


大学4年生の正社員内定要因に関する実証分析

민속지_이건욱T 최종


[ 영어영문학 ] 제 55 권 4 호 (2010) ( ) ( ) ( ) 1) Kyuchul Yoon, Ji-Yeon Oh & Sang-Cheol Ahn. Teaching English prosody through English poems with clon

달생산이 초산모 분만시간에 미치는 영향 Ⅰ. 서 론 Ⅱ. 연구대상 및 방법 達 은 23) 의 丹 溪 에 최초로 기 재된 처방으로, 에 복용하면 한 다하여 난산의 예방과 및, 등에 널리 활용되어 왔다. 達 은 이 毒 하고 는 甘 苦 하여 氣, 氣 寬,, 結 의 효능이 있

하나님의 선한 손의 도우심 이세상에서 가장 큰 축복은 하나님이 나와 함께 하시는 것입니다. 그 이 유는 하나님이 모든 축복의 근원이시기 때문입니다. 에스라서에 보면 하나님의 선한 손의 도우심이 함께 했던 사람의 이야기 가 나와 있는데 에스라 7장은 거듭해서 그 비결을

4 RIVERSIDE TRAIL IMPROVEMENT 1 8 Wide Asphalt Trail Design follows AASHTO and ADA design guidance. Stable and maintainable surface: new asphalt or re

Something that can be seen, touched or otherwise sensed

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

슬라이드 제목 없음

- 2 -

歯49손욱.PDF

Output file

sna-node-ties

서강대학교 기초과학연구소대학중점연구소 심포지엄기초과학연구소

< FB1B9BEEEB1B3C0B0BFACB1B C1FD5FC3D6C1BE2E687770>

,,,.,,,, (, 2013).,.,, (,, 2011). (, 2007;, 2008), (, 2005;,, 2007).,, (,, 2010;, 2010), (2012),,,.. (, 2011:,, 2012). (2007) 26%., (,,, 2011;, 2006;

04_오픈지엘API.key


<4D F736F F F696E74202D20C7D0BFACBBEAB9DAC1D8BFF8>

황지웅


CONTENTS INTRODUCTION CHARE COUPLED DEVICE(CCD) CMOS IMAE SENSOR(CIS) PIXEL STRUCTURE CONSIDERIN ISSUES SINAL PROCESSIN

[ReadyToCameral]RUF¹öÆÛ(CSTA02-29).hwp

Microsoft PowerPoint - AC3.pptx

원고스타일 정의

WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성 ( 황수경 ) ꌙ 127 노동정책연구 제 4 권제 2 호 pp.127~148 c 한국노동연구원 WHO 의새로운국제장애분류 (ICF) 에대한이해와기능적장애개념의필요성황수경 *, (disabi

목 차

<31335FB1C7B0E6C7CABFDC2E687770>

01_60p_서천민속지_1장_최종_출력ff.indd

4번.hwp

예제 1.1 ( 관계연산자 ) >> A=1:9, B=9-A A = B = >> tf = A>4 % 4 보다큰 A 의원소들을찾을경우 tf = >> tf = (A==B) % A

Microsoft PowerPoint - 27.pptx

solution map_....

274 한국문화 73

45-51 ¹Ú¼ø¸¸

Stage 2 First Phonics

Å©·¹Àγ»Áö20p

À±½Â¿í Ãâ·Â

제 출 문 한국산업안전공단 이사장 귀하 본 보고서를 2002 년도 공단 연구사업계획에 따라 수행한 산 업안전보건연구수요조사- 산업안전보건연구의 우선순위설정 과제의 최종보고서로 제출합니다. 2003년 5월 연구기관 : 산업안전보건연구원 안전경영정책연구실 정책조사연구팀 연

Journal of Educational Innovation Research 2017, Vol. 27, No. 2, pp DOI: : Researc


DBPIA-NURIMEDIA

<5BBEF0BEEE33332D335D20312EB1E8B4EBC0CD2E687770>


PowerPoint 프레젠테이션

May 10~ Hotel Inter-Burgo Exco, Daegu Plenary lectures From metabolic syndrome to diabetes Meta-inflammation responsible for the progression fr

, ( ) 1) *.. I. (batch). (production planning). (downstream stage) (stockout).... (endangered). (utilization). *

歯M PDF

SW_faq2000번역.PDF

融合先验信息到三维重建 组会报 告[2]

04 형사판례연구 hwp

012임수진

Microsoft Word - P02.doc

(3) () () LOSS LOSS LOSS LOSS (4) = 100 = 100 = 100 = 100 = 100 = 100 = 100 = 100 = 100 = 100 = 100 = 100

IKC43_06.hwp

GEAR KOREA

OR MS와 응용-03장

Transcription:

생명정보학의이해 (Introduction to Bioinformatics) Chapter 5. DNA Microarray 데이터분석 박성희 (shpark@ssu.ac.kr) ac kr) 목차 DNA Microarray 실험의원리 Microarray 데이터전처리 이미지처리 (image preprocessing) Microarray 데이터정규화 (Normalization) Microarray 데이터의생명정보학적분석 군집화 (Clustering) 계층적클러스터링 K-Means 클러스터링 분류 ( classification) 숭실대학교생명정보학과 8-5-8 (c)sung Hee Park 생명정보학의이해 DNA Chip 이란? 매우작은금속또는유리표면에수천, 수만종의 DNA 를고밀도로부착시키고이들 DNA와 hybridization 되는유전자를초고속으로분석하는장치 One High-Throughput Method: Microarrays DNA microarray is large-scale gene expression analysis 대량의유전자발현을분석할수있는실험방법 what is varied: individuals, strains ( 계통 ), cell types, environmental conditions, disease states, etc. what is measured: RNA quantities for thousands of genes, exons or other transcribed sequences 8-5-8 (c)sung Hee Park 생명정보학의이해 3 8-5-8 (c)sung Hee Park 생명정보학의이해

Microarray 의데이터 D matrix (차원행렬 ) 로표현 행 (row) : 유전자 (gene), 단백질 ( proteins) 등 열 (column) : 개인 (individuals), strains( 계통 ), 세포타입 (cell types) 등 Microarray 의목적 how active are various genes in different cell/tissue types? how does the activity level of various genes change under different conditions? stages of a cell cycle environmental conditions disease states what genes seem to be regulated together? Find the genes that change expression between experimental and control samples Classify samples based on a gene expression profile Find patterns: Groups of biologically related genes that change expression together across samples/treatments 8-5-8 (c)sung Hee Park 생명정보학의이해 5 8-5-8 (c)sung Hee Park 생명정보학의이해 6 Pool of Cell Lines Tumor Different amounts of starting material. Differential labeling efficiency of dyes Different amounts of RNA in each channel Differential efficiency of scanning in each channel. Differential efficiency of hybridization over slide surface. Microarray DNA chips, gene chips, DNA arrays Spot 에놓이는종류에따른분류 cdna microarray chip (Pat Brown, Stanford Univ.) 이미밝혀진 ORF(open reading frame) 을 chip 에집적 생체내 mrna를역전사효소로 cdna를합성하여위 ORF와의 hybridization시키면그발현량에따라 signal 크기변화 특정유전자의발현정도분석 Oligonucleotide chip (Affymetrix Inc.) ~5 개의 nucleotide로이루어진 DNA probe 를집적 개의 nucleotide(a,g,c,t) 의조합으로이루어진 probe와시료dna를 hybridization시키면둘의염기서열일치정도에따라 signal크기변화 DNA 염기서열, 돌연변이된염기서열분석 anchoring pieces of DNA to glass/silicon slides complementary hybridization 8-5-8 (c)sung Hee Park 생명정보학의이해 7 8-5-8 (c)sung Hee Park 생명정보학의이해 8

Microarry 실험 원리: Pin microarray Complementary Hybridization 1995년 미국 Stanford (Dr. Pat. Brown)대학에서 개발 미리 제작된 oligonucleotide나 cdna를 pin으로 칩 위에 이식시킴 1cm 칩안에 ~3천개 유전자 집적 가능 8-5-8 (c)sung Hee Park 생명정보학의 이해 9 Probe Extract and Labeling WILD 8-5-8 (c)sung Hee Park 생명정보학의 이해 1 Hybridization and Scanning Cy3-labeled Cy3 labeled Cy5-labeled Cy5 labeled wild cdna mutant cdnam MUTANT Laser Cy y 3: 533 nm Cy 5: 6 nm cells or tissues 8 srna 18 srna 15 ug g total RNA and QC RNA polymerase mrna RNA Detector cdna synthesis Hybridization Reverse transcriptase cdna 7.5k cdna chip cy3 cy5 8-5-8 (c)sung Hee Park 생명정보학의 이해 11 8-5-8 (c)sung Hee Park 생명정보학의 이해 1

1 1 8 6 - - -6 R =.1 R =.6185 Intensity Dependence Comparison 6 8 1 1 1 16 18.5*(Log(G) + Log(R)) Slide3 Slide7 Poly. (Slide7) Poly. (Slide3) Processing and Log(R/G) Image Processing Data Normalization Image Processing Spot 의위치를파악 Gridding 이라고함 실험시스팟의위치가커버글라스에의해밀려나기도함 Segmentation 찾아진스팟의밝기를결정 Fore ground 와 back ground Differential Gene Expression Cluster Pathway 8-5-8 (c)sung Hee Park 생명정보학의이해 13 8-5-8 (c)sung Hee Park 생명정보학의이해 1 Gridding Segmentation 8-5-8 (c)sung Hee Park 생명정보학의이해 15 8-5-8 (c)sung Hee Park 생명정보학의이해 16

1 1 8 6 - - -6 R =.1 R =.6185 Intensity Dependence Comparison 6 8 1 1 1 16 18.5*(Log(G) + Log(R)) Slide3 Slide7 Poly. (Slide7) Poly. (Slide3) Processing of Array data Pixel images of a spot Which genes are interested Log Cy3 G=log cy5/cy3 Each pixel have a cy3 and cy5 ratio. Mean and median of pixels from a given spot for both cy5 and cy3 channel. Intensity of a given spot is calculated by a cy5/cy3 ratio. Log-transformed intensities approach a normal distribution 8-5-8 (c)sung Hee Park 생명정보학의이해 17 Log Cy5 8-5-8 (c)sung Hee Park 생명정보학의이해 18 Processing and and Data Mining Differential Gene Expression Cluster Log(R/G) Image Processing Data Normalization Pathway 전산학적분석 identifying differential expression which h genes have different expression levels l across two groups clustering genes which genes seem to be regulated together clustering samples which treatments/individuals have similar profiles classifying genes to which functional class does a given gene belong classifying samples to which class does a given sample belong 8-5-8 (c)sung Hee Park 생명정보학의이해 19 8-5-8 (c)sung Hee Park 생명정보학의이해

Cluster Clustering: organization of a collection of unlabeled patterns into clusters based on similarity Patterns within the same cluster are more similar to each other than they are to a pattern belong to a different cluster. Putative ti mitochondrial i carrier Clustering gene expression data Group the genes together that share the similar gene expression pattern across a data set Gene expression across several treatments genes involved in the same biological process are likely co- regulated Arrays showing similar gene expression profiles in order to discover sample groups Chlorophyll binding protein Hypothesis: Genes with similar function have similar expression profiles 8-5-8 (c)sung Hee Park 생명정보학의이해 1 8-5-8 (c)sung Hee Park 생명정보학의이해 클러스터링기법 (Clustering Method) Hierarchical clustering ( 계층적군집화 ) Agglomerative Single, complete, average linkage k-means or k-medoids SOMs (Self Organized Maps) 8-5-8 (c)sung Hee Park 생명정보학의이해 3 8-5-8 (c)sung Hee Park 생명정보학의이해

Hierarchical clustering Every gene (or array) is placed at a specific node in a hierarchy (tree-like structure) so that it is possible to address distance between points Dendrogram or hierarchical tree ( 계층트리 ) The number of clusters is determined by distance cutoff K-means or SOM partitions the data into pre-defined number of nodes without a hierarchy between data points The hierarchy can be constructed by either top-down (divisive) or bottom-up (agglomerative) 8-5-8 (c)sung Hee Park 생명정보학의이해 5 Hierarchical approach Agglomerative Start t with the points as individual id clusters At each step, merge the closest pair of clusters Divisive Start with one, all-inclusive cluster At each step, split a cluster until only singleton clusters. 15.15.1.5 p1 p p3 p p5... 6 5 p1 3 p 5 p3 p 1 3 1 p5 1 3 5 6.. Dendrogram Nested cluster diagram. Proximity Matrix 8-5-8 (c)sung Hee Park 생명정보학의이해 6 How to measure similarity two individual patterns How to measure distance of two clusters Measure of dissimilarity between two individual patterns (gene vector) a gene expression pattern a is represented by a vector of measurements [a 1,a,.,a N] Euclidean distance dissimilarity MIN (single linkage) MAX (complete linkage) Scalar product dissimilarity Correlation coefficient Group average (average linkage) Distance Between Centroids 8-5-8 (c)sung Hee Park 생명정보학의이해 7 8-5-8 (c)sung Hee Park 생명정보학의이해 8

Single, Complete, Average Linkage Algorithms Distance between Clusters In method="single", we use the smallest dissimilarity between a point in the first cluster and a point in the second cluster (nearest neighbor method). When method="complete", we use the largest dissimilarity between a point in the first cluster and a point in the second cluster (furthest neighbor method). For method="average", the distance between two clusters is the average of the dissimilarities between the points in one cluster and the points in the other cluster. 8-5-8 (c)sung Hee Park 생명정보학의이해 9 Sample data.6.5..3..1 5.1..3..5.6 Set of 6 two-dimensional points 3 1 6 Point X y P1..53 P..38 P3.35.3 P 6.6 19.19 P5.8.1 p6.5.3 P1 P P3 P P5 p6 P1....37.3.3 P...15..1.5 P3. 15.15. 15.15 8.8 11.11 P.37..15..9. P5.3.1.8.9..39 P6.3.5.11..39. xy coordinates of 6 points Euclidean distance matrix for 6 points 8-5-8 (c)sung Hee Park 생명정보학의이해 3 MIN 3 1 5..15 5 1 3 6.1.5 3 6 5 1 Dendrogram Nested Clusters P1 P P3 P P5 p6 P1....37.3.3 P...15..1.5 P3. 15.15. 15.15 8.8 11.11 P.37..15..9. P5.3.1.8.9..39 P6.3.5.11..39. Dist({3,6}, {,5}) = min(dist(3,), dist(6,), dist(3,5), dist(6,5)) = min(.15,.5,.8,.39) =.15 8-5-8 (c)sung Hee Park 생명정보학의이해 31 8-5-8 (c)sung Hee Park 생명정보학의이해 3