춖계심혈관통합학술대회 Assessment of Intra- and Inter- Observer Variability Jun-Bean Park Seoul National University Hospital
역학연구의핵심요소 연구모집단 source population 연구설계 study design 외인변수 extraneous variables 연구참여집단 study group 표본추출 sampling 선택바이어스 selection bias 교란바이어스 confounding bias 노출변수 exposure variables 관련성 association 결과 outcome 결과와데이터분석 정보바이어스 information bias 분석 연속형 // 이분형 // 명목형 // 개수 // 시갂사건 개인 집단 ( 예, 가구 ) 지역 인과추론 causal inferences Dohoo I, Martin W, Stryhn H. (2012) Methods in Epidemiologic Research, VER Inc.
정의 검사법의타당도와신뢰도 타당도 (validity) 검사법이진단하고자하는질병의유무를얼마나정확하게판정하는가에대한능력 정확도 (accuracy) 신뢰도 (reliability) 측정조건에따라검사결과가얼마나일관되게나타나는지에대한능력 타당도의전제조건, 검사자갂변이중요 정밀도 (precision), 재현성 (reproducibility), 반복성 (repeatability) Concordance = test retest reliability
타당도 vs. 신뢰도
검사법의타당도기준 질병상태 검사결과 있음 없음 전체 양성 a b a+b 음성 c d c+d 전체 a+c b+d a+b+c+d 민감도 ( 감수성, sensitivity) = a / (a + c) 특이도 ( 특이성, specificity) = d / (b + d) 위양성 ( 거짒양성, false-positive rate) = b / (b + d) 위음성 ( 거짒음성, false-negative rate) = c / (a + c) 유병률 (prevalence) = (a + c) / (a + b + c + d) 양성예측도 (positive predictive value) = a / (a + b) 음성예측도 (negative predictive value) = d / (c + d) 양성가능도비 (likelihood ratio positive, LR+) = [a / (a + c)] / [b / (b + d)] 음성가능도비 (likelihood ratio negative, LR-) = [c / (a + c)] / [d / (b + d)]
신뢰도 ( 정밀도, 재현성, 반복성 ) Reliability, precision, reproducibility, repeatability 한실험자가반복검사또는여러실험자가동일한검사수행시얼마나일치하는가? 관찰자내변이 (intra-observer variation) 와관찰자갂변이 (inter-observer variation) 무작위오차 (random error) 가높으면싞뢰도낮음. 계통오차 (systematic error) 만있는경우싞뢰도높을수있음.
신뢰도에영향을미치는변이 1. 관찰자내변이 (intra-observer variation) 같은임상의가같은사람에대해혈압이나싞장을연속적으로측정하거나, 같은엑스선사진을몇번에걸쳐서판독할때생기는측정과해석의차이. 2. 관찰자갂변이 (inter-observer variation) 두명의다른임상의가같은사람의혈압을측정하거나 같은엑스선사진을각각판독할때생기는차이
지금소견은좀다른데요. 처음에는환자분이다른질병이있다고생각했지만
신뢰도측정방법 순위척도의경우 일치율 (agreement percent) 카파통계량 (kappa statistics, kappa value) 스피어맨순위상관계수 (Spearman s rank correlation coefficient) 연속척도의경우 급내상관계수 (intra-class correlation coefficient, ICC) 블랜드 - 앨트먼도표 (Bland-Altman plot) * 일치도측정시피어슨 (Pearson s) 상관계수는권장하지않음.
관찰자 1 관찰자 2 양성 음성 전체 양성 30 (a) 7 (b) 37 음성 3 (c) 60 (d) 63 전체 33 67 100 1. 관찰일치도 = 30 + 60 = 90 2. 최대가능일치도 = 30 + 7 + 3 + 60 = 100 3. 일치율 = (30 + 60) / (30 + 7 + 3 + 60) = 90 / 100 = 90% 4. 칸 a 기대일치도 = [(30 + 7)(30 + 3)] / 100 = [(37)(33)] / 100 = 12.2 5. 칸 d 기대일치도 = [(3 + 60)(7 + 60)] / 100 = [(63)(67)] / 100 = 42.2 6. 전체기대일치도 =12.2+42.2=54.4 7. 카파통계량 =[( 관찰된일치율 ) - ( 우연에의해기대된일치율 )] / [100% - ( 우연에의해기대된일치율 )] =(90-54.4) / (100-54.4) = 0.78
Kappa statistics Value of κ Strength of agreement <0.20 Poor 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Good 0.81-1.00 Very good Altman DG. (1991) Practical Statistics for Medical Research, p 404.
Intra-class correlation coefficient (ICC) 2 u 2 2 u e σ 2 u = Between-subject variance σ 2 e = Within-subject (measurement error) variance ICC = 결과해석은 kappa value 와동일
Between vs. within subject variance
Bland-Altman plot
Badagliacca, R, et al. (2015) JACC: CV Imaging Bland-Altman plot Intra-observer and Inter-observer variability assessed by the Bland-Altman method
Bland-Altman plot 두단계로살펴봐야함. 바이어스 (bias) 와변이 (variation)
Bias & variation 바이어스 (bias) 측정법이평균적으로 (on average) 일치하는가?, 또는핚측정법이다른측정법보다높은 / 낮은값을읽는경향이있는가? 두측정값차이의평균을이용 변이 (variation) 측정법이개별적으로 (for an individual) 일치하는가? 두측정값차이의표준편차를이용
예시 최대호기유속 (Peak Expiratory Flow Rate (PEFR), l/min) 을측정하는서로다른두측정법을평가 두측정법은같은연속형변수
PEFR by mini Wright peak flow meter 1 단계 : 일치선위에두측정값산점도그리기 700 600 500 400 300 200 r = 0.88 p = 0.01 100 100 200 300 400 500 600 700 PEFR by Wright peak flow meter
2 단계 : 두측정값의평균을 x 축으로, 차이를 y 축으로산점도그리기 Diff in PEFR (large-mini) (l/min) 80 60 40 20 0-20 -40-60 -80-100 200 300 400 500 600 700 Average PEFR by two meters (l/min)
3 단계 : 차이의평균 (bias) 과일치도핚계 (variation) 를계산 Diff in PEFR (large-mini) (l/min) 80 60 Mean + 2SD 40 20 0-20 Mean -40-60 -80 Mean - 2SD -100 200 300 400 500 600 700 Average PEFR by two meters (l/min)
4 단계 : 해석 바이어스 = 차이의평균 = -2.1 l/min 일치한계 : mean difference ± 2 standard deviations = -2.1 - (2 x 38.8), -2.1 + (2 x 38.8) = -79.7, 75.5 l/min The mini meter may be 80 l/min below or 76 l/min above the large meter. This is not acceptable for clinical purposes, but not immediately apparent from the scatterplot (nor from the correlation coefficient).
Interpretation of Bland-Altman plot Bland-Altman plot 은일반적으로비공식적으로해석. 아래세질문을확인. 두측정법사이에평균적불일치 (bias) 가얼마나큰가? 임상적으로해석. 불일치가중요할정도로충분히큰가? 이는임상적질문이지, 통계적질문이아님. 경향이있는가? 측정법사이의차이가평균이증가함에따라커지는 ( 또는작아지는 ) 경향이있는가? 변동성 (variability) 이그래프전체적으로일정핚가? Bias 선주위로흩어진정도가평균이커짐에따라더커지는가?
Example 1: Case of a proportional error. Example 2: Case where the variation of at least one method depends strongly on the magnitude of measurements. Example 3: Case of an absolute systematic error
Which approach to be used? Summary of Indices or Graphic Approaches Most Frequently Used for the Assessment of Validity and Reliability Mostly Used to Assess Type of Variable Index or Technique Validity Reliability Categorical Sensitivity / Specificity ++ Youden s J statistic ++ + Percent agreement + ++ Percent positive agreement + ++ Ordinal correlation coefficient (Spearman) + + Kappa statistic + ++ Continuous Scatter plot (correlation plot) + ++ Linear correlation coefficient (Pearson) + + Intra-class correlation coefficient + ++ Mean within-pair difference + ++ Coefficient of variation + ++ Bland-Altman plot ++ ++ Modified from Szklo M, Nieto FJ. Epidemiology Beyond the Basics, 2 nd Ed. (2007)
귀하의논문에서는새로운검사법의정상 / 비정상구분을 cutoff value XX 를사용하여분석하였습니다. Intraobserver 혹은 interobserver variability 가이 cutoff value 보다클수있을것같은데요?
Measures of reliability Popular measures of relative reliability Intra-class correlation coefficient (ICC) Pearson s r correlation coefficient Popular measures of absolute reliability Limits of Agreement (LoA): Bland-Altman plot 2 measurements Root mean square error (RMSE) 2 or more measurements Coefficient of Variation (CV)
RMSE (root mean square error) M i : 모의치 O i : 실측치 N: 샘플링개수
RMSE (root mean square error) The error between simulation results and field data, which is calculated by using the repeated measure ANOVA. This is also known as the within-subject standard deviation, which represents the within-subject variation from test to test, averaged over all subjects, reflecting absolute reliability.
Interpretation of RMSE The difference between a subject s measurement and the true value would be expected to be less than 1.96 x RMSE for 95% of observations. Another useful way of presenting measurement error is sometimes called the repeatability, which is 2 x 1.96 x RMSE. The difference between two measurements for the same subject is expected to be less than 2 x 1.96 x RMSE for 95% of observations. [Bland & Altman, 1996]
예시 실시갂 3 차원심장초음파를이용하여 RV-EF 를측정시, intraobserver variability 와 inter-observer variability 를평가 변수명 : 1) id: 환자고유번호 2) analyzer: 관찰자구분번호 (ex. 관찰자 1, 관찰자 2) 3) time: 관찰자내분석순서 (ex. 첫번째분석, 두번째분석 ) 4) EFp: RV-EF
예시. anova EFp analyzer / id analyzer time analyzer#time, repeated(time) Number of obs = 60 R-squared = 0.9718 Root MSE = 2.23959 Adj R-squared = 0.9123 Source Partial SS df MS F Prob > F Model 3279.67075 40 81.9917688 16.35 0.0000 analyzer 10.8993571 1 10.8993571 0.13 0.7238 id analyzer 3267.02896 38 85.9744462 time.257599598 1.257599598 0.05 0.8231 analyzer#time 0 0 Residual 95.2993423 19 5.01575486 Total 3374.97009 59 57.2028829
예시 RMSE = 2.24% 1.96 x RMSE = 4.39% Repeatability = 2 x 1.96 x RMSE = 6.21% 이번연구에서 RV-EF 가 5% 호전을보이면치료효과가있는것으로정의하였는데, 2 번측정치의차이가 6% 정도될수있구나.
Take home message 1. 검사법관련연구 : 타당도 / 신뢰도 2. 신뢰도영향주는변이 : 관찰자내 / 갂변이 3. 신뢰도측정방법 (1): 순위 / 연속척도 4. 신뢰도측정방법 (2): 상대적 / 절대적분석
Thank you for your attention!