This is going to be a quick discussion about the study on ulcerative colitis (UC) that has been getting some attention recently. I have friends visiting me right now and I don’t want to spend much time writing about this study but its justifiably facing some backlash from patients and parents. Again my disclaimer-I have no real ‘classroom’ training in research. I do not have a masters or PhD. What I do have is an interest in research, statistics and GI. So if you agree with some things I say great, if you don’t agree that’s okay too help me learn more and comment!
The paper in question published in the journal Inflammatory Bowel Diseases titled, “Childhood maltreatment is associated with ulcerative colitis but not Crohn’s disease: findings from a population-based study”, reported that individuals with UC have significantly higher rates of childhood sexual and physical abuse as compared to those with Crohn’s or without IBD. Furthermore, the authors suggest that a possible explanation for these disproportionately high rates of UC in those who have seen abuse could be epigenetic changes resulting from chronic neuroendocrine stress associated with growing up in these conditions. Basically they’re saying a higher percentage of people with UC have grown up and experienced abuse (they cannot identify the age when it occured) and that this may be causing changes to their gene expression possibly causing them to have UC. Yea, okay clearly you can see why people are not too thrilled so we should take a look at this a bit more carefully.
- Strong non-response bias: A non-response bias is distinct from both selection and response biases but actually any of them can be the pitfall of survey type studies. What is non response bias-it’s basically that the people who responded are different than those who did not respond. From the methods section (page 2 under Sample sub-section) they only analyzed patients with IBD (who had to self report from a long list that they have IBD) who also completed their adverse childhood experience questionnaires fully. I think right there is why you’re seeing such high rates of abuse among the patients with IBD. Basically I suspect that people with IBD who do not have any childhood abuse etc probably just didn’t fill out the questions which would have excluded them from analysis. There are a few ways the authors could have explored the non-response bias but they do not – in my opinion reviewers should have requested further analysis about the n with IBD who did not complete the questionnaires, n without IBD who completed questionnaire fully, and to compare demographics/diagnoses between quartiles of childhood abuse. None of that was done and the authors do not even bring this up as a potential study limitation.
- No hypothesis leading to multiple comparisons: This study was generated from a large cross sectional database which there’s nothing inherently wrong with but you just need to be careful and rigorous with the interpretation of the results. When you get this much data it’s really easy just to dive in without a specific hypothesis which will lead you to strange places based on probability alone. This has been viewed as bad science basically forever although the advent of genome wide association studies (GWAS), exome sequening and similar techniques challenge this notion (beyond the scope of this discussion). What do I mean by multiple comparisons? Well everyone knows that in a research study the researchers always are after the magical significance of “P<0.05”. This means that the probability that the observation (people with UC have more abuse in their past) occurred by chance alone is less than 5%. From there you can see that if you make 100 comparisons about 5 of them will be “statistically significant” by chance alone. Imagine this on the scale of hundreds of comparisons just by playing with the data and you will see it becomes highly improbable to not get significant results by chance! There are statistical ways to correct for multiple comparisons that they don’t mention-Bonferroni etc. There’s much more to this conversation but that’s the gist and I think this idea is further supported by the number (and different types) of papers that were generated from this same database by the first author.
- Correlation =/ causation: This is a very cliche and low hanging fruit when critiquing a study but it’s true and here is a funny link that may illustrate it perfectly for you. Funny Spurious Correlations. There are some very funny correlations in this website (yes, this is what lab humor looks like). I like the number of people who died due to getting tangled in bedsheets correlating with per capita cheese consumption (r=.94) data from the CDC and USDA. Check out the link its entertaining to see how highly correlated some things are.
- Wonky statistics/analysis: I will start with Table 1. First I would have compared the rates of early adversities between IBD and non-IBD. That’s an obvious logical first analysis that was not reported. From there it makes more sense to parse out Crohn’s versus UC (which is what they dive into immediately in table 1). All of table 1 is using chi squared and t tests to compare Crohn’s vs. UC (Bottom of table 1) virtually ignoring the non-IBD controls and makes interpretation of the data difficult. If they are not compared to individuals without IBD how do we know if these rates of adverse effects are truly higher then what is observed in the general population? Table 2: Again the same point holds as to why not perform logistic regression trying to identify IBD in aggregate before separating by diagnosis. In adjusted models the odds ratios, particularly in sexual abuse, decrease quite a bit (still remain significant) but I imagine that in an IBD vs. non-IBD logistic regression there would be no significant difference.
I could go on discussing confounders, the discussion on epigenetics/neuroendocrine which I think is just a massive leap, that the limitations in the discussion of the study don’t really address any real statistical issues and that they claim that IBD is relatively rare yet Canada has the highest prevalence in the world (the main reason why the sample sizes were small I suspect is because they required the total completion of the adverse childhood events questions). Usually you shouldn’t need to look much further then the methods section to be able to predict what kinds of issues a paper may run into. From there, good science addresses any methodological or statistical limitations in every way they can and publishes once they can no longer think of alternative explanations.
Ultimately, it is up to the peer reviewers to screen papers thoroughly and I hope (and expect) that a group will write a reply to the authors and the journal highlighting several of the limitations and warning its interpretation with caution. I wouldn’t lose much sleep over this study 🙂 hope everyone has a good weekend.
“if you torture your data long enough, they will tell you whatever you want to hear”
james L mills