So far, we took a look at discrete distributions. Now let's take a look at continuous distributions. So far, we looked at discrete probability distributions with a countable number of outcomes or scenarios. Sometimes the random variable being modeled has a large number of scenarios on any given small interval. And the probability of any one exact scenario realized is very, very small. Think of such examples, such as the stock prices moment, or the amount of rainfall in a region. In such cases, it makes sense to describe a probability distribution using groups of scenarios rather than focusing on one individual scenario. A continuous random variable big X can take any value in the continuous real line. We can describe the pdf, the probability density function and CDF, the cumulative distribution function for a continuous random variable as before, and here are the definitions. We are interested the random variable big X taking a value near small x. This is the density function. Small f(x), the density function refers to the probability that the random variable takes a value in the infinitesimal thin region around small x. Big F(x) is the cumulative distribution function. As defined, it's cumulative and it's a probability that the random variable takes any value smaller than or equal to small f(x). This is written as the integral of all values of the pdf from negative infinity to x. As we defined before for discrete random variables, we can describe the mean and standard variation for continuous random variable. The mean of the expectation of a continuous random variable is defined as follows. Take any possible value of the random variable, and the density function of the random variable around that value u. And take the integral for all possible values of u from negative infinity, the positive infinity. This gives you the expected value of the expected value of the random variable is going to be equal to the mean. Similarly we can calculate variance of the random variable. What's variance? Variance is the expected value of how the random variable deviates from the mean square. So in this case, we take the value of a random variable u subtracted from mu the difference, square it, and multiply it by the density function corresponding to u. And we integrate from all possible values of u from the negative infinity to positive infinity, and this gives us a variance. And the standard deviation is nothing but the square root of the variance. Now, let's visualize a continuous random variable X and how it's distributed. This continuous random variable big X is distributed all over the region that's denoted here. And the y axis denotes the probability density corresponding to each value that the random variable takes. And the probability density function looks shaded like the green color here. Continuous random variables take shapes like the distribution shown in the figure. In this case, we see the probability density function of a continuous random variable X. Sometimes we might be interested in the light green areas shown here. This is the area in which the random variable X takes some value between X1 and X2. The total area of the entire curve including the dark green region and the light green region must be equal to 1, because the random variable should take some value in the region. Let's look at an example. Our first example is a normal distribution which is one of the most popular examples of a continuous probability distribution. It allows for the underlying random variable to take any value from negative infinity to positive infinity. And a normal distribution is completely characterized by two parameters, the mean often denoted by mu, and the standard deviation, sigma. In this figure, we see the probability density function of a normal distribution. This is also called the bell curve, because it's shaped like a bell on which we see the mean mu, which is right in the center. This gives us the average value that the random variable takes. And sigma the standard deviation, which gives us the spread of the random variable. In the normal distribution case the mean mu is also corresponding the mid level value, which is the median. The median is also in the center and the mode. The most likely value which is the highest point of the pdf which is also in the center. They're looking for the radical farmer law for this curve it's given by this model f(x) below. Which is ratio of 1 over sigma times the square root of 2pi, with exponential raised to the value of the random variable away from the mean squared divided by 2 sigma squared. The density function of course, looks complex but can be calculated on Excel. Similarly, the cumulative distribution function gives you the integral of all possible densities from negative infinity up to that value of x. In the previous slide, we saw statistical formulas to calculate the pdf and CDF for the normal distribution, and these can be easily implemented in Excel. For any given normal random variable with mean mu on standard deviation sigma. The pdf formula in Excel, we use normdist. Norm for normal, dist for distribution, normal distribution, normdist. X is the value we are interested in f(x), and we plug in mu, sigma and 0. 0 gives you the pdf. If you're interested in the CDF or the cumulative distribution function, all we have to do is the same formula normdist, except instead of 0, we plug in 1. We give normdist x, mu, sigma, 1. We get CDF value at random variable value x. Let's look at another example. This example is the uniform distribution. We saw a discreet uniform distribution. Now let's look at continuous uniform distribution. This distribution allows for the underlying random variable to take any value from a minimum point, let's say A to a maximum point, let's say B. And any outcome in between these two values minimum and maximum is equally likely to occur as any other value. Again, like the normal distribution the uniform distribution is also categorized completely by two parameters. Once you know these two parameters, you know everything about the distribution. In this case, typically we use minimum and maximum values. Just as in the case of normal distribution, they can also draw the probability density function for the uniform distribution. As the name suggest, the probability is equal and uniform between the minimum value a and the maximum value b. Hence the pdf, small f(x) is given as 1 over b minus a, for any value from a to b. It's 0 otherwise. Similarly, we can write the cumulative probabilities. The cumulative probability of any value up to a is 0. Because the random variable never takes a value less than a. Similarly, the cumulative probability of the random variable being less than b are any number higher than b or less than that is 1. Because the random variable is always contained within a and b. Between a and b, the accumulated distribution is given as follows. It's a ratio of x- a over the entire length b- a, and this is gradually increasing from 0 up to 1. So far, we have focused on two example distributions, the normal distribution, the uniform distribution. However, many other continuous distributions are often used. Exponential distribution for example, is used to model loan processing times. Beta distribution to model project completion times over fixed intervals. Gamma distribution to model events in insurance risk. and longnormal distribution for example to model low probability events that take high values with low probability. Which of these distributions fit well? This is an important question. We will learn about goodness of fit test for normal distribution and the uniform distributions in the next session. Although there are several other distributions because of time constraints we are going to look at just normal and uniform distributions. I'll see you in week 3 session 3 in the upcoming session. And then we will look at distribution goodness of fit. In the last session, we explored different families of distributions that can be used to model realities. It is natural to wonder about the following question. How good is a model of reality that uses a certain distribution? In the final session of week 3, we will continue to explore this very question. And goodness of fit of a distribution in modeling realities based on past data.