How I Read a Randomized Clinical Trial Before I Believe It
Five steps to reading a clinical trial critically
We all are busy and reading medical research studies is time-consuming, so I understand the impulse to just read the abstract and skim the conclusion. I do that sometimes. But that is how we get misled, since many trials are not well designed and authors often spin their findings precisely where we focus our attention the most. That is why it is important to develop skills for analyzing clinical research trials: so you can know whether or not they should affect your practice. Here’s a brief primer on how to get started and the stepwise approach I use to efficiently but critically appraise the value of a clinical trial.
Step 1. What’s being compared to what?
The first step is to determine what the intervention is and what it is being compared against. The intervention usually is easy to identify and often is in the title of the article, but the comparator is just as important: Is it placebo or standard of care? These are the first considerations because the most important thing you’re going to look at is the difference between the treatment arm and the comparator arm regarding the outcomes (see next section). If the comparator is a placebo, you’re going to be more likely to see a difference, and that is reasonable in some contexts. But usually the best comparator is the standard of care, and not just any standard of care: the best standard of care. A classic trick in trials of new antihypertensive therapies, including renal denervation therapy, is to compare a new agent to an arm that is reasonable enough to pass for standard of care, but is well below best care, such as low doses of short-acting drugs given once a day, or therapies that are not particularly effective, like beta-blockers, thiazide diuretics, or once daily losartan.
The comparator doesn’t have to be a suboptimal drug to be a bad comparator - it could cause harm and make the intervention look better than it really is. In REDUCE-IT, the placebo was mineral oil, which raised LDL cholesterol, apoB, and inflammatory markers in the control arm. What looked like a home run for icosapentaenoic acid - a large reduction in cardiovascular disease events - may have been partly a function of the control arm getting worse rather than the treatment arm improving. The effects may be real, but the magnitude almost certainly is overstated.
Simultaneous Step 1. What’s the outcome and who cares?
Equally important are the outcomes and who cares about them. I immediately ask, are these objective or subjective, because objective outcomes - like death, or adjudicated myocardial infarction or stroke - are much less sensitive to bias. The next tier includes related clinical events such as ischemia-driven revascularization, hospitalizations for heart failure, or transient ischemic attacks – they have subjective components that get amplified if the study is not blinded. These endpoints can be biased in ways that may not be equally distributed between arms, even in a randomized trial. And lower tier outcomes - like changes in surrogate endpoints such as laboratory values or imaging measures may be one or more steps removed from outcomes patients actually care about.
I also ask who cares about the interventions and outcomes the most. Sometimes studies are designed to sell products, and they measure outcomes that patients don’t really care about. A classic example is using hospitalization for heart failure as an endpoint without looking at all-cause hospitalization. Hospitalization for heart failure is important, but you need context. Given the multimorbidity of most of our heart failure patients and the subjectivity of who gets hospitalized versus outpatient therapy, I’d want to know total hospitalizations. If they have fewer heart failure hospitalizations but more hospitalizations for kidney failure, falls, or infections, that apparent benefit might be completely lost or even offset.
I also like to understand the motivations of the study funders and authors. Are they trying to sell a product? Are they trying to improve longevity and quality of life? I can’t read people’s minds, but sometimes you can tell by the study design what they’re trying to do.
Step 2. Study design, blinding, and funding
The considerations above depend heavily on study design. For a new intervention, particularly a permanently implanted device, I want to see evidence of benefit on hard clinical outcomes or safety endpoints that matter to patients. The burden of proof is on the new intervention, not on a well-studied standard of care. For example, the CHAMPION-AF study was designed as a non-inferiority study, although I believe it should have been designed as a superiority study. It also included a bleeding-related endpoint that excluded post-procedural bleeding, even though patients care about bleeding whenever it occurs.
On the subject of blinding, it’s vital to know whether the study was double-blinded, single-blinded, or not blinded at all. There are few good excuses for not doing a double-blinded study, and it is concerning that articles get published examining the effect of a device or procedure on clinical outcomes without blinded placebo or sham control arms. Subjective or symptom-driven outcomes without a placebo control are highly biased, due to the well-known phenomenon of subtraction anxiety, in which a patient who enters a study expecting to receive a beneficial intervention doesn’t get it, becomes symptomatic, and drops out, creating a biased study sample, as well as placebo and nocebo effects. Recent examples include the TRILUMINATE Study of tricuspid valve repair, which showed improvements in symptoms among people who received a device intervention for tricuspid valve regurgitation but had no control arm. An even better example is renal denervation therapy, which showed a large reduction in blood pressure in the unblinded SIMPLICITY-HTN2 trial, but only a small, non-significant effect when tested against a sham procedure in SIMPLICITY-HTN3.
I also ask whether the follow-up was long enough for the intervention to have its full effect and for clinically meaningful events to accumulate, because trials that are too short can show early signals of large effect or no effect that do not hold in the long run.
And finally, I ask who funded the study, because everyone is biased - sometimes the bias is intellectual, sometimes it is financial. That bias shapes how studies are designed, written, and reported. Indeed, the study design often reveals intent more reliably than a disclosure statement.
Step 3. The study population
I then turn to Table 1 and ask who was enrolled, whether the study population makes sense for the intervention and outcomes, and whether the participants resemble the patients for whom I would consider the intervention. Next, I look at the recruitment flow diagram (usually Figure 1), because it tells me how many were screened and rejected to get into the study. That is important, because sometimes enrollment is so precisely tuned that the sample no longer reflects clinical reality.
The numbers in Table 1 tell a lot – and give a lot away. The recent VESALIUS-CV study enrolled patients with a mean LDL cholesterol of 122 mg/dL, despite reportedly intensive lipid-lowering therapy at baseline. Nearly all had diabetes mellitus or non-occlusive atherosclerotic vascular disease and, based on their demographics and countries of enrollment, likely were clinical patients rather than screen-detected cases. Those details matter because they substantially affect the external validity of the study, particularly for those who used it to justify lower lipid targets and population-based atherosclerosis screening.
Step 4. The absolute event rates
Then I turn my attention to the results and focus mainly on the absolute event rates within and between arms, and I ask myself: Do they make sense? Are the differences clinically important? Over the length of the study, what is the difference in absolute terms? I care much less about relative risks because people live absolute lives and have absolute events, not relative ones.
The effect size is vastly more important than the p-value, which is as much a function of sample size and variability as it is of the point estimates. I look at the confidence intervals, but I always return to the absolute event rates in each arm and ask myself how fragile they are. If by chance a small number of people in each arm had or didn’t have events, would it have changed the statistical outcome?
Step 5. How I actually read the paper
To get all of this information, you can’t just read the abstract, the introduction, and the conclusion. Those sections often are filled with spin, especially the abstract and the study conclusions. I read the methods, the tables and figures, and then go back and look at the results. I largely ignore the discussion until I’ve thought carefully about the data and weighed it against my prior convictions about the intervention and the biological plausibility, in particular, whether the event rates and the effect size are plausible. Then I read the conclusions, and I’m very cautious about spin.
If my conclusions differ from the authors’, or if anything seems even slightly out of the ordinary, I pull up the supplements. It requires extra clicks and downloading, but a vast amount of information is buried there. Often the key point that unlocks unexpected data can be found there. It’s also where additional analyses requested by reviewers are buried, where you can find details about the countries in which patients were enrolled, and where other findings are tucked away that may run against the message the authors are trying to convey.
Conclusion
Randomized clinical trials that have hard endpoints and large absolute effect sizes don’t need spin because the data tell the story. Learning to read past the abstract and the authors’ conclusions and focusing instead on the methods, the tables, and the supplements takes practice, but it doesn’t take long once it becomes habit. The payoff is that you stop outsourcing your clinical judgment to other people’s interpretations of their own work.



Excellent summary. I always say the most important part of a paper is the methods section, cuz like many things, it’s garbage in/garbage out.
Like you, I care much more about ARR and NNT, and basically don’t care about RRR.
The only thing I would add, when trying to decide whether X deserves being incorporated into practice, is to make a distinction btw statistically significant effect and clinically relevant effect.
James, the comparator is the section I linger over too, because it is where a trial quietly states what it takes ordinary care to be. A weak control arm does more than flatter the intervention. It redraws the moral baseline of the paper, so the patient in the comparator group becomes the study's stand-in for the patient we would have treated anyway, only treated worse.
That is what makes the methods more than housekeeping. Reading them is how you ask whether the trial respected the clinical world it is requesting permission to change. And when a new intervention beats a diminished version of care, the result stops being about the intervention. It becomes a measure of how far the baseline had to be lowered to let it win.