342 Journal of the Korean Society of Health Information and Health Statistics Volume 34, Number 2, 2009, pp. 139152 139 이혜선 1), 명성민 2), 김도영 3), 한광협 3), 송기준 1) 1) 2) 3) A study on the updating of prediction model for the development of hepatoma 1) Department of Biostatistics, Yonsei University College of Medicine 2) Department of Medical Informatics, Jungwon University 3) Department of Internal medicine, Yonsei University College of Medicine Abstract Objectives: The statistical prediction models are useful to establishing diagnostic and treatment rule in clinical area. So, there is an increasing interest in building a precise model to predict the probability of diseases for individual patient. In doing that, it is important to reflect the patient's changeable characteristics for improvement of predictive power. In this paper, we studied the methods for the updating of prediction model that add the information of new patients to the existing model. Methods: To update the prediction model, we used an established model including 7 risk factors such as diagnostic type, hepatitic virus type, age, sex, -FP, ALT, and drinking history and did the re-calibration and shrinkage of intercept and slope of existing one. Results: we considered 4 updating methods, that is, the first one is to use existing model as it is and the second one is to re-calibrate the overall intercept. Also the third one is to re-calibrate overall intercept and slope and the last one is to re-calibrate and shrink overall intercept, and individual slope. Conclusions: s contain old and new informations. And the model updating method by using many data can be improved predictive power. Especially, the last updating method was found to be the most accurate and useful one. Key Word: prediction model, update method, re-calibration, shrinkage, intercept, slope * (A050021). :, 134 E-mail : biostat@yuhs.ac 342
140 1. 서론 1.1 연구배경및목적 Table 1. Cancer mortality 2003 2004 2005 2006 2007 2008 Stomach cancer no. of deaths 11,701 11,190 10,935 10,716 10,563 10,312 mortality rates 24.2 23.1 22.5 21.9 21.5 20.9 Lung cancer no. of deaths 12,673 13,246 13,733 14,027 14,278 14,791 mortality rates 26.2 27.3 28.2 28.7 29.1 29.9 Liver cancer no. of deaths 10,916 10,861 10,877 10,884 11,144 11,292 mortality rates 22.6 22.4 22.3 22.3 22.7 22.9 Colon cancer no. of deaths 5,484 5,859 6,043 6,244 6,650 6,855 mortality rates 11.4 12.1 12.4 12.8 13.5 13.9 Breast cancer no. of deaths 1,404 1,484 1,573 1,598 1,670 1,731 mortality rates 2.9 3.1 3.3 3.3 3.4 3.5 Uterine cancer no. of deaths 1,397 1,325 1,345 1,240 1,241 1,261 motarlity rates 2.9 2.7 2.8 2.5 2.5 2.5 All others no. of deaths 19,757 20,334 20,595 20,793 22,007 22,670 mortality rates 40.9 41.9 42.3 42.5 44.8 45.9 1.2 연구내용및방법 342
141 2. 이론적배경 2.1 로지스틱회귀분석 ln exp exp ln ln ln exp 342
142 ln ln 2.2 로지스틱회귀모형의재보정 (re-calibration) 342
143 2.3 로지스틱회귀모형의축소 (shrinkage) m od 2.4 로지스틱회귀모형의절충모형 3. 연구방법 3.1 예측모형개선방안 342
144 Table 2. s No. Predictors considered Parameters estimated 1 No adjustment 10 0 2 Intercept 10 1 3 + calibration slope 10 2 4 + + 10 212 (1) 1 (2) 2 (3) 3 (4) 4 3.2 모형비교 342
145 4. 결과 4.1 자료에대한개요 Table 3. Summary of data Old data New data Entering period 1990. 1~1998. 12 1999. 1~2000. 12 No. of total patients 994 883 No. of lung cancer patients 90(9.05%) 44(5.28%) 342
146 (1) Table 4. Description of risk factor Variable Description Positive LC Liver cirrhosis - CH Chronic hepatitis - HCV Hepatitis C virus - HBV Hepatitis B virus - AGE - 40 SEX - Male -FP Alpha-fetoprotein 20(IU/) ALT Alanine aminotransferase 40(IU/) Heavy alcohol - 5 80g Unknown alcohol - - Table 5. Frequency table of risk factor Ultrasonography Hepatitis AGE SEX -FP ALT Drinking Old data(n=994) New data(n=833) LC 335(33.7%) 282(33.85%) CH 540(54.33%) 460(55.22%) Carrier, Other 119(11.97%) 91(10.92%) HCV 121(12.17%) 133(15.97%) HBV 781(78.57%) 613(73.59%) NonBNonC 92(9.26%) 87(10.44%) 40 798(80.28%) 635(76.23%) <40 196(19.72%) 198(23.77%) Male 683(68.71%) 568(68.19%) Female 311(31.29%) 265(31.81%) 20(IU/) 191(19.22%) 120(14.41%) <20(IU/) 803(80.78%) 713(85.59%) 40(IU/) 552(55.53%) 521(62.55%) <40(IU/) 442(44.47%) 312(37.45%) Heavy alcohol 149(14.99%) 110(13.21%) Non/Social alcohol 543(54.63%) 628(75.39%) Unknown alcohol 302(30.38%) 95(11.4%) 342
147 4.2 개선방안에따른예측모형 (1) 4 Table 6. Logistic regression coefficient(standard deviance) of old data and new data Variable Old data(n=994) New data(n=833) Intercept -6.254(1.053) -7.697(1.383) LC 1.722(0.409) 2.324(1.038) CH 0.734(0.265) 0.454(1.082) HCV 1.263(0.499) 1.005(0.799) HBV 0.775(0.396) 1.433(0.621) AGE(40) 1.315(0.003) 0.816(0.564) SEX(male) 0.300(0.328) 1.617(0.574) -FP(20IU/) 0.826(0.464) 0.928(0.396) ALT(40IU/) 0.283(0.150) -0.800(0.359) Heavy alcohol 0.584(0.393) 0.175(0.432) Unknown alcohol 0.222(0.400) 1.327(0.448) 342
148 Table 7. Apparent parameter of updated versions Parameters estimated Regression coefficient 2 : intercept -1.477±0.159 3 : intercept -1.549±0.241 : calibration slope 0.931±0.173 4 : intercept -1.962±0.277 : calibration slope 1.434±0.352 : LC 1.613±0.417 : CH -1.438±0.414 : HCV -0.134±0.588 : HBV 0.719±0.566 : AGE(40) -0.230±0.465 : SEX(male) 0.069±0.562 : -FP(20IU/) -1.818±0.385 : ALT(40IU/) 0.656±0.360 : Heavy alcohol 0.896±0.346 : Unknown alcohol 1.434±0.352 Table 8. Regression coefficient of updated versions 1 Regression coefficient 2 3 4 Intercept -6.254-7.731-7.803-8.216 LC 1.722 1.722 1.603 1.351 CH 0.734 0.734 0.683 0.576 HCV 1.263 1.263 1.176 0.991 HBV 0.775 0.775 0.722 0.609 AGE(40) 1.315 1.315 1.224 1.032 SEX(male) 0.300 0.300 0.279 0.235 -FP(20IU/) 0.826 0.826 0.769 0.648 ALT(40IU/) 0.283 0.283 0.263 0.221 Heavy alcohol 0.584 0.584 0.544 0.459 Unknown alcohol 0.222 0.222 0.207 0.175 342
149 (2) Figure 1. Calibration plot of updated versions 342
150 Figure 2. ROC curve of updated versions 342
151 Table 9. Apparent performance of updated versions 1 2 3 4 Parameters estimated 0 1 2 12 U statistic 0.150 0.000 0.000 0.000 c statistic 0.741 0.741 0.741 0.760 Brier score 0.071 0.048 0.048 0.045 0.930 0.930 0.999 1.000 5. 고찰 342
152 참고문헌 [1] Okuda K, Ohtsuki T, Obata H, et al. Natural history of hepatocellular carcinoma and prognosis in relation to treatment Study of 850 patients. Cancer 1985; 56: 918-928. [2] Steyberg EW, Borsboom GJJM, Houwelingen HCV, Eijkemans MJC, Habbema JDF. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Statistics in Medicine 2004; 23: 2567-86. [3] Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Annal of Internal Medicine 1999; 130: 515-24. [4]., ; 2001. [5],.. ; 2003. [6] Harrell Jr FE. Regression modeling strategies, Springer 2001. [7] Ennis M, Hinton G, Naylor D, Revow M, Tibshirani R. A comparison of statistical learning methods on the GUSTO database, Statistics in Medicine 1998; 17: 2501-2508. [8] Tibshirani R. Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B. 1996; 58: 267-288. [9] Steyerberg EW, Eijkemans MJC, Habbema JDF. Application of shrinkage techniques in logistic regression analysis: a case study. Statistica Neerlandica 2001; 55: 76-88. [10] Steyerberg EW, Eijkemans MJC, Harrell Jr FE, Habbema JDF. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Statistics in Medicine 2000; 19: 1059-1079. [11] Steyerberg EW, Eijkemans MJC, Houwelingen JCV, Lee KL, Habbema JDF. Prognostic models based on literature and individual patient data in logistic regression analysis. Statistics in Medicine 2000; 19: 141-160. [12] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, Springer 2001. [13] Cheong JY, Han KH, Kim DK, et al. Establishment of Individual Prediction Model According to Risk Factors for Development of Hepatocellular Carcinoma in Korea: Establishment of Individual Prediction Model for Hepatocellular Carcinoma. The Korean Journal of Hepatology 2001; 4: 449-458(Korean). [14] Choi JW, Ahn SH, Moon CM, et al. Efficacy of Individual Prediction Model for the Early Diagnosis of Hepatocellular Carcinoma. Korean Journal of Medicine 2004; 67: 7-14(Korean). [15] Harrell Jr FE, Lee KL, Mark DB. Tutorial in biostatistics multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors, Statistics in Medicine 1996; 15: 361-387. [16] Houwelingen HCV. Validation, calibration, revision and combination of prognostic survival models. Statistics in Medicine 2000; 19: 3401-3415. [17] Steyerberg EW, Vergouwe Y, Keizer HJ, Habbema JDF. Residual mass histology in testicular cancer: development and validation of a clinical prediction rule. Statistics in Medicine 2001; 20: 3847-3859. 342