National Expert Critiques MHS Initial Analysis
A Critique of Baseline Issues in the Initial Medicare Health Support Report
Thomas Wilson, PhD, DrPH
Trajectory Healthcare, LLC
July 18, 2007
Two important methodological observations follow from reading the initial report on the Medicare Health Support (MHS) project authored by the Research Triangle Institute (RTI), International:
1) The metamorphosis of the MHS project from a randomized control trial (RCT) design to an observational study design.
2) The inability to examine MHS project data using standard observational study techniques.
It’s the one-two punch, using boxing “knock out” language. If these two observations are true, the scientific validity of this report – and subsequent reports on program impact if they continue to follow this pattern – will be questionable at best, and potentially meaningless from a scientific perspective.
It is important to review exactly what the authors said and the evidence behind their conclusions before arriving at our own.
Analysis of Point #1: Randomization Failure?
Addressing the first observation, the authors wrote: “Our analyses at the time of randomization confirm equivalency … [but] substantive differences between the intervention and comparison populations emerge in the interval between randomization and go-live.” (Page 4, 20-21).
What is the evidence for these “substantive differences;” this loss of equivalence?
1) The justification for the equivalence assumption at randomization was based on the results of a statistical test on per-beneficiary-per month (PBPM) cost differences (See Table 3-4 on page 21), Hierarchical Cost Classification (HCC) risk score differences (see Table 3-3 on page 20), differences in all cause hospitalizations, heart failure hospitalization, and diabetes hospitalization per 100 people (see Table 3-4). In all cases, the differences were reported as “not statistically significant.” (As would be expected if randomization was successful).
2) The justification for non-equivalence “prior to the start of the pilot” was based on percent differences in PBPM between the two groups only (see Table 6-1 on page 37). While statistical test results were not in the table several pages earlier (page 20), the authors wrote; “only the difference in one Medicare Health Service Organization (MHSO) group’s PBPM is statistically significant at the 5% level or better at the start of the pilot.”. Puzzlingly, similar data presented at the time of randomization were not presented. These include HCC differences, all cause hospitalization differences, heart failure hospitalization differences, and diabetes hospitalization differences.
So, it turns out that the argument for “substantive differences” between the time of randomization and the “go-live” (pre-enrollment) date, are based only on PBPM, a financial measure. Importantly, if statistical significance is a guide to scientifically acceptable differences, only one of the eight sites showed a statistically significant difference in PBPM.
Other important measures of equivalence, used by the authors at baseline, are ignored at the go-live date. These non-financial measures, e.g. those related to quality, utilization, and HCC risk score, seem to have been relegated to second-class status, for some unknown reason.
Is there a bias here to financial measures only and against standard scientific practices of assessing differences between groups using statistical significance tests?
Analysis of Point #2: Observational Design?
Addressing the second observation the authors wrote: “the financial reconciliation protocols as initially agreed upon do not make adjustments for differences in payments at the start of the pilot.” (Page 21).
No adjustments possible? No back-up plan if equivalence is lost?
This is a potentially earthshaking strategy, especially if also true for non-financial variables. Have the MHS leaders put all their eggs in the randomization basket? If so, it is an unfortunate misunderstanding of the original intent of the study, a study that Congress mandated to be a randomized controlled trial – this implies to scientists both randomization and control,
Sandra Foote argued in her seminal 2002 Health Affairs article on the potential value of disease management in fee-for-service Medicare: “The results of controlled trials would contribute greatly to the state of knowledge in the field and would be extremely valuable in setting benchmarks for future performance expectations if pilot projects prove to be successful.” Foote’s article was an argument for a controlled trial, not just a randomized control trial, and it was also more than financial. Back in 2003, several observers (including myself) discussed the potential for non-equivalence in this trial. But it never occurred to me that it would be impossible to deal with this issue if evidence suggested the initial equivalence did not hold!
As it now stands, it appears the MHS can only be observational--if equivalence was not achieved at baseline, so be it … nothing can be done. This is a sorry state. The numerous techniques available to the health services researchers to adjust for non-equivalence do not appear to be available here. If this is the case, the value of this study from the point of view of science is highly suspect.
Suggestions for Improvement
Following are a few suggestions to the MHS project sponsors:
1) Modify the agreement between the MHSOs and CMS to allow for statistical adjustments to control for non-equivalence if it, in fact, occurs … and is validated using standard statistical tests. Moreover, insure that any statistical adjustments made are completely transparent, and include equations, variance estimates, and all numerators and denominators for all metrics. 
2) Conduct a more complete analysis of baseline equivalence “at the time of the start of the pilot” including PBPM costs, HCC scores, utilization metrics, and quality metrics --- using statistical significance tests that are fully transparent.
To learn from the MHS project, scientifically valid methods must be used to fairly assess the impact of the disease management programs on both financial metrics and non-financial metrics
Based upon the information in this initial report, not enough scientific evidence was presented to support the authors’ statement of “substantial differences” between the intervention and reference group at the pre-enrollment period. While differences were presented in PBPM costs, in seven out of eight cases these were not “statistically significant.” It thus appears that only actual differences, not scientific-based statistical differences, are contractually relevant here. The authors’ apparent inability to adjust for differences in financial metrics between the intervention and reference groups seems to back up this point. It thus appears that we may not have a randomized control trial anymore. Indeed, if adjustments are not possible in the event of non-equivalence, we may not even have the ability to do a rigorous observational study! Has science been relegated to a second-class status? If so, who put it there? More information is needed.
As it stands now, this study, from the point of view of science, has been set up to result in little or no learning regarding the impact of disease management, as intended by Foote, Congress and other stakeholders. But things can change with appropriate adjustments. It is recommended that future evaluations of the MHS use standard statistical tests to assess differences between the intervention and reference groups on all metrics. Adjustments should be made, and these must be fully transparent, when and if statistically significant differences do emerge.
Finally, it is recommended that non-financial metrics (e.g., related to quality) should be evaluated with the same rigor and using the same credible methods as used to evaluate financial variables. It is in everyone’s best interest to back up the MHS results with credible, sound, and transparent scientific methods.
Dr. Thomas Wilson
Copyright 2007. Thomas W. Wilson. All Rights Reserved
 McCall N, Cromwell J, Bernard S. Evaluation of Phase I of Medicare Health Support (Formerly Voluntary Chronic Care Improvement) Pilot Program Under Traditional Fee-for-Service Medicare Report to Congress. June 2007. http://www.cms.hhs.gov/reports/downloads/McCall.pdf
 What evidence was not shown?
1) The variance estimates (e.g. the method for standard deviations, and the results of the standard error) that are customarily presented in RCTs were not reported at either the randomization time period of the pre-enrollment time period. Such data would allow an informed reader to independently review of the validity of the tests that are designed to assess the level of random error in the data.
2) A data table was not shown supporting the authors’ statements of equivalence / non-equivalence of standard confounding variables (e.g. age, gender) at neither the post-randomization nor the pre-enrollment period. Some of these were discussed in the text, but quantitative information was not provided.
3) A data table showing the baseline comparison of intermediate outcome metrics and ultimate outcome metrics between intervention and reference groups was not shown at either the post-randomization or the pre-enrollment periods.
a. Intermediate outcomes (testing for HbA1c, lipids, micro-albumin, and retinopathy) were reported comparing outcomes six months post-intervention (Figure 5-1, page 32), but the results of these were not presented at neither the post-randomization period, nor the pre-enrollment period.
 Specified in Section 721 of the Medicare Prescription Drug, Improvement, and Modernization Act of 2003. http://www.cms.hhs.gov/CCIP/downloads/section_721.pdf
 Foote SM. Population-based disease management under fee-for-service Medicare. Health Aff (Millwood). 2003 Jul-Dec;Suppl Web Exclusives:W3-342-56.
 DM News Editor. Biases Cloud DM’s Ability to Succeed in Medicare Pilot: Selection Issues May Present Industry with Framework for Failure. Disease Management News. Vol. 9, Number 11. April 2, 2003, page 1,4-5.
 Equivalence and statistical considerations, based on transparent impact methods, are principles espoused by the non-profit Population Health Impact Institute (www.PHIinstitute.org).