의료의질관리 (4): Risk-adjusted Outcome 2008. 4. 11. 서울의대의료관리학교실김윤
Chap 1. Reasons for Risk-adjustment 2
Why risk-adjustment? Backgrounds Safety, Quality, Cost, Equity Compelling arguments Patients decision making about Value of specific clinical intervention Which providers offer high-quality care Similarly, purchasers, payers, policy-makers The goal is to answer the following questions Which treatments are most effective? Which providers give the best care? Which health plans are most efficient? Which delivery systems provide the most patientcentered care? Who produce the best outcomes? 3
Purpose of risk-adjustment Meaningful comparison require risk-adjustment Sicker patient larger cost / poorer outcome Healthier counterpart Smaller cost / better outcome Apples are compared to apples, not to oranges 4 Algebra of effectiveness: fig 1.1 RCT Randomization of risk Gold standard of treatment efficacy Outcome = f (intrinsic patient risk factors, treatment effectiveness, quality of care, random chance)
5
Credible risk-adjustment methods for certain outcomes Predicting in-hospital mortality for children and adults in ICU Predicting in-hospital mortality and postoperative complications for CABG and other major surgeries Examining patient-reported satisfaction Setting prospective payment levels for specific episode of care such as DRGs, RUGs Setting capitation payment levels for managed care organizations 6
7 The devil of risk adjustment is the details HCFA Medicare hospital mortality figures (1986) Observed mortality 87.6 (Hospice) Predicted mortality 22.5
Chap 2. Getting started and defining terms 8
First step in considering risk- adjustment Answering four questions Risk of what outcome Over what time period For what population For what purpose The goal of comparing risk-adjusted outcome 9
Risk of what? Risk adjustment is meaningless without answering the question, Risk of what? 3 types of outcomes Clinical outcomes Resource use Patient-centered outcome Selected risk adjustment methods Tab 2.1 & 2.2 10
Risk of what? Best strategy Selecting a risk-adjustment method designed specifically for the target outcome One size does not fit all Cost prediction methods: not good for mortality Disease-specific methods: CABG vs. OB complication DRG vs. RUG 11
Over what time frame? CHQC example Time frame vs. data source Discharge abstract Unable to capture risk factors that predate care Attributional validity Likelihood that poor risk-adjusted outcome reflect poor care rather than high patient risks Tab 2.3 12
For what population? Diverse population Age, sex, skin color, language, immediate world generation, gender, race, ethnicity, culture Populations have different risk factors for various health- related outcome Population of interest helps determine the range of RFs APACHE vs. PRISM CSI separate component for different population groups 13 Special Populations Mental health Person with disability Long-term care and home-based setting
For what purpose? Purpose of using risk-adjustment drives Risk of what/ time frame/ population 14 Purpose = comparison Setting payment level for individual patients or health plan enrollees Encourage providers or health plans to treat or accept high-cost or potentially high-risk population Comparing efficiency and cost of car across providers or health plans Producing public report cards about performance of individual provider Internal comparison of patient outcomes across physicians within individual practice setting to motivate quality improvement
Chap 8. Conceptual and Practical Issues in Developing Risk-adjustment Methods 15
Developing risk-adjustment method Developing credible risk-adjustment method is challenging, costly, and time-consuming Recommend taking an existing method off-theshelf Even if existing measure do no perfectly match a project context and goals, the trade-off may be worth it. 16 However, When the target audience is clinician, clinical credibility is vitally important. Most widely used risk-adjustment methods aim to meet broad policy objectives, such as setting payment level
Objectives of the chapter Important considerations in developing riskadjustment methods Intended audience Not only for persons developing risk-adjustment methods But also for those seeking to understand the development process and the consequent strengths and limitations of a risk-adjustment method 17
First step in considering risk- adjustment Answering four questions Risk of what outcome Over what time period For what population For what purpose The goal of comparing risk-adjusted outcome 18
Outcome must be clearly and reliably defined Reliability is the most important prerequisite for the use of outcome data in the comparative assessment From a clinical perspective, each of the NSQIP morbidity outcome could be expected to generate its own set of risk factors Postoperative respiratory failure Postoperative pneumonia Outcome must be frequent enough for statistical modeling 19
Outcomes and risk factors Optimal assessment of effectiveness of care requires investigation of various outcomes Positive outcome Sx, functional status, prevention of death Negative outcome Mortality, complication, lack of improvement, dissatisfaction Trade-off among outcomes Clinical benefits and resource use Core set of RFs can be used to adjust for risks of different outcomes, such as mortality, complication, and cost Similar RFs across different outcomes Varying weights of RFs across different outcomes 20 Different risk-adjustment strategies for different unit of observations Disease-specific severity score Overall severity score for patients
21 Tab 8.1
What groups/units are compared? = Unit of analysis Depends on one s purpose and context Individual/group of patients, Individual/group of physician/hospital Nested hierarchical modeling Depends on time window Individual hospitalization If longitudinal information is available A patient during an episode of illness or over a set time period Different risk-adjustment strategies for different unit of observations Disease-specific severity score Overall severity score for patients 22
Identifying risk factors Using administrative data Limited choices of RFs existing adm-data based RAM method (Ch2) Essential to develop a priori clinical hypothesis Relationship btw potential RFs and the outcomes of interest Otherwise, data-dredging (over-fitting) Strong performance metric (R 2, c-stat) in developmental data set May not validate well in other data sets Do not make clinical sense 23
Identifying risk factors Strategies To identify candidate risk factors To develop hypothesis Published reports To ask clinical experts or panels of practicing clinicians Herding cats metaphor Valuable advice but must remain focused on the specific outcome and the time frame of intended analysis 24
Identifying risk factors Translating clinically important concepts into variables That can be measured reliably and validly using available data sources Significant challenge!! Researchers may not fully appreciate the challenges of gathering risk factors until they try it 25 VA surgical Risk Study (1991) Based on retrospective review of discharge abstracts Inconsistent coding Poor inter-rater reliability Missing important RFs from the charts Concurrent review by trained nurse reviewers Implementation of EMR (VISTA) cost for data collection, analysis, and reporting: $38/each surgery
Identifying risk factors Pilot study Retrospective: Examine 5~10 charts to determine their level of detail and completeness Prospective: pretesting data collecting procedure 26 Problems in data collection Lack of standardization among common diagnostic procedures MMPS pneumonia model Major RF # of lobes involved practice of taking chest X-ray no data High rates of missing value = small fraction of pts have these tests Inconsistent interpretation Unacceptable data-collection burden
27 Question: 위험요인으로적절하지않은것은? 개요 사례 : 심평원 AMI 적정성평가 대상질환 : 급성심근경색 (AMI) 결과변수 : 병원내사망률 위험요인 인구학적특성 : age, sex 동반질환 : 고혈압, 당뇨병, 질환위험요인 Ejection fraction, systolic BP, cardiogenic shock, Congestive heart failure at admission Acute neurological change within 48 hrs after admission EKG ischemia, arrhythmia Left anterior descending artery occlusion, Verbal response within 48 hours after admission Intra-aortic balloon pump
Risk factor vs. Quality of care Risk factor = pt attribute NOT component of process of care Complex trade-offs in designing operational risk-adjustment methods Balancing purity against results VA NSQIP case Operation complexity score Emergency operation 28
Severity of illness vs. Hospital teaching status and size VA teaching hospital >> nonteaching hospital Operation complexity score Serious risk factors Risk-adjusted 30day mortality Teaching hospital > nonteaching hospital In general surgery, orthopedic, urology, vascular surgery No difference in 8 common surgeries 29
30 Question: what s wrong with risk factors? VA NSQIP risk-adjusted postoperative LOS Considered Risk factors Older age, non white race, ASA class 3, partially dependant functional status, Intra-op blood transfusion, op time 3 hr, Post-op UTI, ileus, pneumonia, # of complication 2 Return to OR, Conclusion of researchers Although preoperative factors were independently associated with a prolonged LOS, the factors generating the highest risks for a prolonged LOS were the intraoperative processes of care and postoprative adverse events
Timing of risk factors APACHE Aims to predict in-hospital death Most physiological parameters are measured repeatedly in ICUs Which values should be used for risk adjustment? First? The worst over some time period? What time period? 31
Timing of risk factors Worst value Patient risk/severity? Result of therapeutic or diagnostic mishaps Answer depend on the context APACH developers observation Worst value = the first value on admission in 88% of measurement Reviewer training Less trained: easier to find the first value 32 Time period of data collection According the extensive analysis of APACHE s developers No significance difference in the risk models Longer time period = greater the possibility of confounding the quality of care with patient risk before treatment Shorter time period = higher level of missing information
Building the risk-adjustment model Combining clinical judgment and empirical modeling is better than either approach alone Based on normative judgment of clinical expert statistically rigorous modeling avoid data dredging 33 Identify candidate risk factors Clinicians: unable to quantify effect of RFs Statistical methods: yield clinically suspect results Hypothesize about their relationship to outcome
Assemble an analytic data set Data cleaning Range check: e.g. temp Identifying impossible occurences Female prostate cancer Finding invalid data elements ICD-9-CM code no meaning in the current version Describing the frequency of missing or poorly specified data element Multivariate checks Value of SBP > DBP 34
35 Treatment of missing values A Priori In most data set, values of some variables are missing Infrequently measured variables are eliminated in creating RAM How should one interpret missing data? Must be answered in both the research and clinical context APACHE unmeasured parameters are likely to be normal MMPS substitute normal values for missing value Showed maximum accuracy and reliability In terminal cancer patient, absence of information about serum electrolyte reflects pt preference NOT physiological status Pt with missing value: more likely to die within 30 days
Treatment of missing values Other considerations Practice pattern CVA : pt with missing values for one or more variables of blood test Teaching tertiary (2%), Teaching non-tertiary (10%), Non teaching (28%) Specific data-collection protocol MedisGroups Not to record values of clinical findings if they were normal No ECG record: Normal/ Missing/ not done?? Broad normal range: many missing value 36 The way missing values are handled may dictate how many cases will be available for analysis Numerous variables for RAM even with small portion of missing value many eliminated cases Skew study population unlikely to be a random sample
Structure of continuous independent variables Most clinical variables have a nonlinear relationship with the outcome U shape with non-symmetry Blood pressure/ temp: Low vs. High Continuous Categorical Age PRISM weights physiological values differently for two age levels APACHE: assign points for patients in diff age category 37 Smoothing technique: LOWESS Clinical judgment combined with statistical analysis Used to determine ranges for continuous variables and relationships with the outcome of these ranges Cutpoints where a large change in outcome occurs for small changes in independent variables
38 Need for data reduction Quantity of information can be overwhelming Potential peril of incorporating too many predictors (MedisGroup : 표 8.2) 제1모형 : 5-level admission score 제2모형 : 10 KCF 제3모형 : 40-65 KCF appeared to be overspecified (cross-validation) How to reduce data Infrequently appeared data: 0.5% Suspicious of quality or poor reliability 신뢰도가낮은것, 질이의심스러운것 Examining univariate association btw individual predictors and the outcomes composite variable : 생리학적인이상을측정하는여러변수를결합하여 주성분분석이나인자분석을통하여변수를줄이는방법 Concerns when using rare RFs Frequency 0.5%, occurences 15 cases Cost of data collection in MR review 의무기록조사시 : 발생빈도가낮은변수도반드시포함되어야하는경우 강력한진료결과예측변수 다른변수와독립적으로진료결과를설명하는경우 그변수를제외할경우임상가들이모형의신뢰성에의문을갖는경우 원래의가정과반대되는방향으로진료결과와상관관계가나타나는경우 자료를면밀히검토
Multivariate modeling technique Type of outcome variable vs. modeling tech Continuous: multiple regression Dichotomous: logistic regression Time to an event: proportional hazard model Procedures Stepwise/ Backward/ Forward selection Sample size vs. procedure Less than 1 thousand obs Using stepwise procedure deteriorate c-statistics More than 2 thousands obs : stepwise 39
40 Multivariate modeling technique Interaction Too many possible interactions to use unguided statistical explosion to detect the important interactions 10 RFs: possible paired interaction: 45 (two-way), 120 (three-way) Clinically guided interaction detection APACHE: pco2 50mmHg consistently having little or no significant relationship to risk of death Hypothesis: weighting for pco2 is also dependant on the associated serum ph Developed combined variable which included pco2 and serum ph
Multivariate modeling technique Avoid overfitting: 2 tactics Continuous outcome variable Always: # of variable * 1/10 < # of obs Preferable: 30 cases to each predictor variable Dichotomous outcome variable Always: one predictor per every 20 positive cases Safe: one predictor per every 20 positive cases Other issues Transformation: log Conversion to scale or score 41
외상환자중증도평가도구 Trauma Scoring
중증도평가도구의활용 병원전단계및병원단계환자분류기준 질평가및질향상활동의도구 병원내질평가및질향상활동 질평가결과에의한외상센터의지정 병원단계응급의료체계구축에기여 치료방법론의비교 외상관련역학자료의수집
중증도평가도구 생리학적중증도지표 (Physiologic measure) RTS(Revised Trauma Score) CRAMS(Circulation, Respiration, Abdominal/thoracic, Motor, Speech scale) APACHE(Acute Physiologic and Chronic Health Evaluation) 해부학적중증도지표 (Anatomic measure) AIS(Abbreviated Injury Scale) ISS(Injury Severity Score) 사망확률평가도구 (survival probability) TRISS(Trauma and Injury Severity Score) ASCOT(A Severity Characterization of Trauma) ICISS(ICD based Injury Severity Score)
생리학적중증도지표 RTS = 0.9368(GCS) + 0.7326(SBP) + 0.2908(RR) CRAMS(Respiratory) Normal( 2), Labored/Shallow(1), Absent(0) APACHE : 0~299 Age Acute physiology: Temperature, Mean arterial pressure, Heart rate, Respiration rate, GCS, Oxygenation, Arterial ph, Serum sodium/ potassium/ creatinine, Hct, WBC, Chronic health : AIDS, Hepatic failure, Lymphoma, Metastatic cancer, Leukemia, Multiple myeloma, Immunosuppression, Chirrosis
AIS(Abbreviated Injury Scale) 6 Body region Head/Neck, Face, Thorax, Abdomen, Extremities, External A.I.S 1 2 SCORES MINOR MODERATE HEAD & NECK Headache/dizziness 2ndary to head trauma Cervical spine strain with no fracture or dislocation Amnesia form accident Lethargic/stuporous obtunded; can be aroused by verbal stimuli Unconsciousness 1 hr Simple vault fracture Thyroid contusion Branchial plexus injury Dislocation or fracture spinous or transverse process of Cervical spine Minor compression fracture(20%) C-spine 3 SEVERE SEVERE CRITICAL NOT LIFE THREATENING LIFE THREATENING SURVIVAL UNCERTAIN Unconsciousness 1-6 hrs Unconsciousness < 1 hr with neurological deficit Fracture base of skull Comminuted compound or Depressed vault fracture Cerebral contusion/ Subarachnoid hemorrhage Intimal tear/thrombosis carotid A. Contusion larynx, pharynx Cervical cord contusion Dislocation or fracture of lamina Body, pedicle or? of C-spine Compression fracture > 1 4 Unconsciousness 1-6 hrs with neurological deficit Unconsciousness 6-24 hrs Appropriate response only to painful stimuli Fractured skull with depression > 2cm, lac dura or tissue loss Intracranial hematoma 100cc Incomplete cervical cord lesion Laryngeal crush Intimal tear/thrombosis carotid A with neuro 5 Unconsciousness with inappropriate movement Unconscious > 24 hrs Brain stem injury Intracranial hematoma > 100 cc Complete cervical cord lesion C4 or below
ISS(Injury Severity Score) ISS=AIS(1) 2 +AIS(2) 2 +AIS(3) 2 Body region 별최고점수 3 개의제곱합 1-75 점 단점 Mortality rates for subsets of ISS=16 cohort Head/neck 17.2% Face 0.0% Thorax 6.1% Abdomen 10.5% 다발성외상의중증도고려곤란
ICISS (ICD based Injury Severity Score) SRR (Survivor Risk Ratio) SRR ICDj = ICISS 기대생존확률 ICD j ICD j 코드를가진생존환자수 코드를가진총환자수 ICISS = SRR inj(1) x SRR inj(2)... x SRR inj(10) 특성 다발성외상의중증도고려가능 별도의자료수집체계불필요 ICD : 중증도평가를위한분류체계아님
사망확률평가도구 (1): 모형 TRISS (Trauma and Injury Severity Score) b=b0+b1(rts)+b2(iss)+b3(age) ICISS 확장모형 b=b0+b1(rts)+b2(iciss)+b3(age) 기대생존확률 Ps=1/(1+e b )
사망확률평가도구 (2): 타당도평가지표 Discrimination Measure Disparity, Sensitivity(%), Specificity(%) Misclassification rate(%) Area under the ROC (receiver operating characteristic) curve (AUC) Goodness-of-fit statistics: Logistic regression model Model χ 2 Hosmer-Lemshow statistic
Area Under the ROC Curve Area under the ROC curve =0.775 Area under the ROC curve =0.775 51.7% 45.9% 22.3%
Methods: : Performance Measures ADE Prediction Model True Alert ADE Alert False Alert + a b c d a+b c+d Accuracy = (a + d) (a+b+c+d) Positive Predictive Value = a (a+b)
ICISS Full Model vs. TRISS: All Blunt Injury TRISS ICD-9CM based ICISS Full Model ICD-10 based ICISS Full Model Disparity 0.644 0.737 0.627 Sensitivity(%) 75.6 82.1 73.1 Specificity(%) 96.9 98.3 96.2 Misclassification rate(%) 7.6 5.2 8.7 ROC analysis 1) 0.958 0.976 0.956 H-L Statistic 2) 3.406 (p=0.906) 7.738 (p=0.460) 1) ROC analysis : Receiver Operating Characteristic analysis 2) H-L Statistic: Hosmer- Lemeshow Statistic 7.294 (p=0.505)
Debate Rules 30 debate regarding reading assignment 1 st round A team: raise questions B team: answer for questions Next rounds Alternate the role of each team Reading assignment Shukri F Khuri, Jennifer Daley, et al (1997). Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care: results of the National Veterans Affairs surgical risk study Pages 315-327 57 Next class: want to deal w/ Statistical Issues?