So after fitting many regression models, the next thing that you might want to do is calculate the statistics that we're going to use to quantify the relationships that we care about. So I'm going to do that with a couple of examples here, and also I'm going to set everything up like I usually do for the plotting, and then I'm going to load the packages that we need. So the key ones that we care about for this analysis here are the limma package, the edge package, and the gene filter package. So the first thing that I'm going to do is I'm going to load this bottomly data set that we've used a couple of different times, and then I'm going to see that I've got this bottomly dataset loaded and I've extracted the expression and phenotype data. So again I'm going to do a transformation on the data and then I'm going to remove the rows that aren't above a particular threshold. So in other words, the lowly expressed genes, so then it makes everything a little bit easier as we go along. So the first thing I might want to do is just compare two groups. So you can do that by calculating the t-statistic for comparing those two groups for every single gene. So one way to do that is to the slow way, the fast way is to compute them all at once using the gene filter package. So you can do that using the rowttest function. So I'm going to calculate the rowttests I'm going to pass it the data matrix of the gene expression data, and then a factor variable telling it I'm going to compare the two levels of the strain. So the result of that is the t-stats object which has a statistic and the p value. Here we're going to look at the statistics, we'll talk about p values later. So if I do that tstats_object and look at this statistic, I can see this is the distribution of statistics of one statistic per gene that's been calculated. It turns out that that's not going to work, this rowttests only works if there's a two group comparison. If you want to do a multi group comparison, you can still do use the gene filter package, you can use the rowFtests function. So if I do rowFtests, I pass it again, the expression data object and then I have to pass it here, any variable that has two or multiple levels. So here I'm going to use lane number which has multiple levels, so I calculate the F-stats. So just to be clear, the lane number variable it can have one of eight different levels. So now it's going to compute, it's looking for any difference at all between any of those possible differences. So then the result is again an object that has stats and p values in it, and so I can look at the statistics from this one. This is now one of those F-statistics that we talked about that it's calculating. So these are all positive compared to the t-statistic which had a symmetric distribution. So that's where you can do it If you don't want to do any adjustment, you can also do no adjustment version of the moderated statistics using limma. So here we have to create the model matrix for the comparison we care about, here is the strain, and then we can fit the limma model using the lmFit command. So I pass it the data matrix, the gene expression data matrix, and the model that I want to fit, fits the model, and then I can shrink those statistics using the eBay's command. Then I can look at the statistics that come out, you see that it actually calculates a statistic both for the intercept and for the strain. So you have to select out the one that you care about, in this case we care about the statistics for showing the second one. So here I can plot that statistic, the second column of the t-statistics versus the t-stats from the gene filter package. So on the x-axis I have the moderated t-stat, and on the y-axis, I have the t-stat. So you can see that they're not exactly the same, but they're very highly correlated with each other, and you can see for example that the moderated t-statistic has a different distribution than the regular t statistic. So then I can add a color to that or aligned to that, a 45 degree line. You see they're very similar. Basically, mostly during the edges are different from each other. A key thing to note here is that I had to multiply by a minus here in front of the t-stats, and that's because basically, we don't know, there's not a natural direction for this t-statistic to go. You have to make sure that the effects are going in the same direction. So you can't do an adjusted statistic using the gene filter package, but you can using the limma package. So here I create a model matrix that has the string variable, it also has the lane number variable as well. So then I can just fit that using the lmFit command just like I did before, but with the adjusted model matrix. Then I can calculate the shrinkage estimates. Now I get t-statistics that are shrunk for the intercept, the strain, as well as the all the different levels of this late lane variable. So now I'm going to get very different results. So if I plot here the results that I get out from the adjusted model versus the t-stats that I got from gene filter, they'll be very different because I'm no longer in one case I'm actually doing an adjustment, and in one case I'm not doing an adjustment. So here you get different statistics all the way through including in the middle of the distribution. So I can add the 45 degree line and you see that there's still highly correlated with each other, but they're no longer the exact same. In fact, you see a little bit off the 45 degree, the coefficients from the adjusted statistic. Okay. So another option that you can do is you can actually fit factor model for multiple level factor using limma. So here I'm going to create the model matrix for that, where I have the lane number factor level, and then I can fit the model just like I did before, where I'm just passing it the new model matrix for lane and the expression data. Then I can do the same empirical Bayes estimation of the statistics. So now you can see I have statistics just for the lane variable. So suppose I wanted to do a comparison of, is there any difference between any of these lanes whatsoever, I can use the top table function to do that. So if I want to find the top statistically associated genes with the lane, I can pass it moderated fit from the limma analysis for lane, and I can tell which coefficients I want to look at to calculate the statistic for. So in this case, I see that it's the second coefficient starts the lane and the last one is the seventh coefficient. So then I say, how many do you want me to report? So it's a top table so it'll report then the n most statistically associated genes. In this case, I want to see all of them. So I'm going to say, report every single gene. Then I don't want it to be sorted so that it's in the same order as the gene that I have. So I have this top lane variable which gives me the F-statistic, the p value, and the adjusted p value for these calculation where you're looking at the association with any of the levels of the lane variable. So I can plot that versus the F-stat from calculating the same thing with the gene filter package, and it's going to be a little bit different here because again we're doing a moderation of the statistics, so it's not quite the same thing. I can do the same thing in edge, and again this might be a way or work if you don't have as much knowledge and model matrices. So you can again build this study using the expression data, you can tell what group variable you want to do here. I'm going to say, use the lane number as the group variable, and then you can calculate the differential expression using the LRT command that stands for likelihood ratio tests. So once I've done that, I can actually also calculate the Q values that go with that, but one of the components of that Q value is the stat variable. So I can use that stat variable to calculate a comparison with the F-stats from the gene filter package. So in this case, they're exactly the same because there's no moderation. So the statistics you get from gene filter and the statistics you get from edge are exactly the same, but with edge you can do an adjustment variable. So here I'm going to again build the same study. I'm going to use a build study, I'm going to pass it the gene expression. I'm going to tell it the group is the lame number, but here I want to adjust for a strain. So I'm passing it an adjust variable with adj.var. So now I have this second edge object. I'm going to be the second differential expression object, calculate this second set of key values. So now the F-stats might be different because I've adjusted for variables and now you see that the F-stats are same once you adjust for the strain. So those are three different ways that you can calculate statistics for many regression models in R.