We've talked so much in this course about how studies can go wrong and how interpretation can go wrong. It's important to address the fact that there are some ways to fix these problems, and one of the ways to fix the problems with observational research in particular is called adjustment. So we're going to talk about adjustment. We'll discuss adjustment, the broad process of trying to represent the pure relationship of exposure and outcome using math, show that adjustment takes many forms and warn you that adjustment might not be all it's cracked up to be. To start us off, I'm going to tell you about the young researcher who made a somewhat startling observation. A rather oblivious young researcher is out on a lovely spring day and makes a startling observation. People drinking Rosé seem to be shorter than those drinking other beverages. "How odd," he thinks. But like any good researcher, he wants to back up his observation with hard numbers. His hypothesis, drinking Rosé leads to shrinkage. So he sets himself in a local bar meticulously recording two variables, the height of each individual and whether or not they're drinking Rosé. After hundreds of observations, he performs a statistical test and sure enough, those who drink Rosé are shorter, p-value 0.004. Really interesting observation, isn't it? But is it true? So what was really going on here? We want to assess the relationship between drinking Rosé and your height. So let's represent this with a causal diagram. We have Rosé, our exposure link to our outcome, which is shortness. I've made this a dashed arrow because secretly in here, there's no real link here, Rosé does not cause you to shrink, and I'm sure you have some hypotheses as to why the data looked the way the data looked. There's no real link, but we're going to measure these things to see what we find, and we find that people who drink Rosé are more likely to be shorter than those who don't. In fact, the percent of Rosé drinkers who are under five foot four inches tall is 30 percent, and the percent of non-Rosé drinkers who are under five foot four inches tall is 15 percent. So you can see the percentages here, you can look and say, "Okay, well, people who drink Rosé are more often short than people who don't drink Rosé." So what's really happening? Here's our causal diagram again, Rosé being linked to short. Now this is not a randomized trial, so there could be confounders, there could be third factors linked to both Rosé drinking and height that induce the correlation we've observed when in fact there's no true causal effects. So what could that third factor be or one of those third factor be? Well, sex. So women might be more likely to drink Rosé, and I apologize for the stereotyping, but this is based on years of personal experience with my wife. Women may be more likely to drink Rosé and women as we know, are on average a bit shorter than men. So perhaps this confounder is present. Now, how do we fix this? How do we adjust for the presence of this confounder? Well, the simplest form of adjustment is called stratification. We simply do the analysis stratified by sex. What we find in that case is that the percent of female Rosé drinkers who are short is 30 percent, the percent of female non-Rosé drinkers who are short is 30 percent. So by looking at the data just within women or just within men, we will have adjusted for the presence of this third factor, we would have adjusted for the confounder. So it's simple strategy, it's called stratification, it's one of the easiest ways you can adjust for a potential third factor. Now, statisticians took this simple method of adjustment and blew it up, using crazy math that allows for adjustment factors that aren't categorical like female, male, continuous factors, non-normal distributions, multiple groups, multiple variables at once, and that is well beyond the scope of this course. But they are all based on the idea that by measuring those third factors, those confounders that might be influencing the relationship you're truly interested in, you can somehow get to the truth of the real relationship underlying it. So let me give you an example of how this works using a real study. This was a study appearing in the New England Journal in 2016 looking at fresh fruit consumption and major cardiovascular disease in China. So it's a huge study looking at people all over China and it asked them how much fresh fruit they ate, and it followed them for a long period of time to see if they developed heart attacks and strokes, other forms of cardiovascular disease, and you won't be surprised to hear that the people who ate more fresh fruit were less likely to have heart attacks and strokes and all manner of crazy things. Now, you've been through this course, you're going to think immediately, "Okay, people weren't randomized to eating fresh fruit versus not eating fresh fruit. There are a lot of factors that are probably associated with fresh fruit eating that may also be associated with that cardiovascular outcome. Confounders." What might some of the confounders be? Well, more money is a big one. The more money you have, the more you can afford fresh fruit. People with more money tend to live longer, that's a truism of medicine because they have access to a lot more healthy things. Maybe non-smokers eat more fresh fruit, there's some data to suggest that. Maybe people who don't like salt as much, they tend to eat more fresh fruit, they don't have as much salt, and it's actually the low salt intake that is protecting them against all the bad cardiovascular things. So there are multiple factors here, multiple potential confounders. When you do adjustment, you essentially are cutting causal lines. So by adjusting for income, you've cut the line between money and fresh fruit. By adjusting for smoking status, you cut the line between smoking and fresh fruit, and by adjusting for lower salt intake, you cut the line between lower salt intake and fresh fruit. Now, in this paradigm, all that's left is the link between fresh fruit and longer life. You can in fact sequentially adjust for things and observed relationship gets smaller and smaller and smaller as you account for all of the potential confounders. Oftentimes, you'll find that there is no residual effect left, that all of the initial observation that said the fresh fruit was protective is really driven by other things. That wasn't the case in this study, it still looked after they adjusted for a number of things, the fresh fruits still helped a bit, but not nearly as much as an unadjusted analysis would suggest. So that's an example of something called multivariable adjustment, when the researchers are adjusting for multiple factors at the same time. Now you don't need to know how this works, the math is a bit complicated. Your job is to determine what is the dependent variable and the hint here is usually your outcome, like cardiovascular disease. What is the independent variable? Usually, that's the exposure of interest, like fresh fruit. Then you want to determine what are the additional covariates. Those are all those things like smoking status and income and where you live and how close you are to a supermarket or any other thing that you think might be a potential confounder. If you can identify that, that went on during a multivariable adjustment, you've got it figured out. The best possible interpretation of a multivariable adjustment or a multivariable regression is as follows: You would say, accounting for the differences in the list of covariates that you adjusted for, the relationship between the independent variable, and the dependent variable is X. So going back to our Rosé example, we would say accounting for sex, the relationship between Rosé and height is zero. So that's what you're telling your audience, that's how a researcher should present it to you. So they're not saying there's no relationship between Rosé drinking and height, of course there is, we observed it, we had the data. You have to say, after accounting for sex, there was no relationship. Or sometimes they say after adjusting for sex, there was no relationship. So keep an eye out for that type of language. Now, adjustment is not a panacea. So I want to talk to you about some of the limitations. So most models that you'll see in the medical literature, multivariable adjustment models and just so you know the terms often they're called regressions like logistic regressions or linear regressions. That's just a fancy term for multivariable adjustment. Most of those treat all those covariates in a linear fashion. So for example, they say, if we're adjusting for income, then we're going to assume that the more income you make, the less likely you are to die, and if you make twice as much income, you're two times less likely to die, and if you make 10 times more income, you're 10 times less likely to die and so on and so forth. There's a linear relationship. Not all models do this, but the vast majority do. That actually works pretty well, believe it or not, but you might imagine that there are certain threshold effects, where beyond a certain point, it doesn't matter how much money you make, you're going to be okay. Different variables, you have to think about if they have a different shape and there are some things that actually might be harmful if they're too high or too low. Body weight is a good example where people who are very obese, have very high body mass index might do poorly overall, but also people who are extremely skinny or wasted might do poorly overall too. So modeling things linearly can sometimes lead to messing up the adjustment. You have to think about that. The other things to think about, if you adjust for something that's poorly measured, you're not really adjusting for it. So income is often something that is not measured very well. First of all, people don't like telling you their income. So oftentimes researchers will ask, what is the range of your income? Do you make between 0 and $20,000 a year, do you make between 20 and 40 and so on and so forth. So it's not measure perfectly, which means you can't adjust for it perfectly. Some researchers don't even ask about income, they just take your zip code and they assign you the median income of the zip code you live in. So now think of the zip code you live in and think of the richest person in that zip code and the poorest person in that zip code, they would both be analyzed in the same way in terms of income. So if you haven't measured your confounder well, you can't adjust for it well. So the take home points today. Adjustment attempts to take observational data and make it look more like a randomized trial by cutting all those causal lines between the confounders, your exposure, and your outcome. Adjustment is often used to suggest causality because you say, we accounted for all those confounders so we can now say there's a causal link. But be careful, you can't adjust for everything, you can't adjust for things you haven't measured and you can't adjust well for things you haven't measured well. Even if you could, you can't adjust perfectly. All our models rely on some math and probably the world is a little bit more complicated. So no matter how well you do your adjustment, there's nothing that replaces the randomized trial for really comparing one intervention to another. See you next time.