[Sound] Welcome! In this lecture, you will learn about challenges regarding model specification, which we will cover in the third set of lectures. Suppose we have a data set of a stock price index with a large number of variables which of which we suspect they may explain movements in the stock index. There are a number of questions that we need to address before we can actually formulate a model for a stock price index as function of the explanatory variables. First, do we just include all explanatory variables or only a few? And if we don't include all variables, how can we select which of the variables to include? Second, do we take the data as they are, or transform the variables? And third, once we have a model, how can we evaluate whether the model is appropriate in some sense? These questions are, of course, relevant in any kind of application, not just the stock market setting which is the focus of this lecture. To set the stage, I now ask you to consider whether we should always use all variables in a data set if they are all relevant. Counterintuitive as it may seem, we do not always include all variables. In the next lecture, you will learn about the why and the how. We will illustrate these questions by looking at an example and we indeed take the stock market setting. This figure shows the annual evolution of the S&P 500 stock price index over the years 1927 up to 2013. There's an exponential growth visible in the figure. Some interesting episodes stand out. For example, the .com bubble at the end of the 1990s and its burst in the early 2000s, and the financial crisis starting 2007, 2008. Of course, there were more crises, but those stand out less clearly in this figure, precisely because of the exponential growth. We're interested in modeling this series, and have a number of explanatory variables. Modeling and forecasting of stock prices is not easy, and many variables have been examined. One could look at stock characteristics, such as how high dividends are, the earnings of firms, market volatility, book value of the firms and issuing activity during a time period. One could also consider general market conditions such as interest rates on government and corporate bonds, or macroeconomic conditions such as inflation, investment and consumption. This list is not exhaustive, and it is hopefully already obvious that it is quite a challenge to select the important variables, if any. The first question we turn to is precisely on how to make this decision. Do we simply select all variables or just a few, and if a few, which ones? This is the topic of lecture 3.2. Now let�s take one of the explanatory variables, which is the book-to-market ratio. This is the book value of the firms relative to the market value. The picture on the left plots the index together with this variable, with the index in blue on the left axis and the book-to-market ratio in red on the right axis. It is obvious the two variables behave differently. The index grows exponentially, while the book-to-market ratio stays relatively stable over time. We can transform the series in order to get a more similar behavior. For example, to undo the exponential growth, we can take the log of the index. This figure plots the log of the index together with the book-to-market ratio, and just by looking at the picture, it seems we got the variables a bit more on the same scale. Taking the log of a series is a very common transformation, and you've already seen it in lecture two. It turns out that in our current application, we still need another transformation. We do not considered the log of the series directly, but the change in the log of the index from one period to the next. This figure plots the change of the log of the index against the book to market ratio, and indeed now the variables move on the same scale. We can regress the change of the log index on a constant and book-to-market to study this relation in more detail. This table provides the output of this regression. It turns out book-to-market is significant in explaining the change in the log of the stock index. It's significant at a 1% level, and the r-squared of this regression is 8%. Now, I invite you to think about the sign of the coefficient of book-to-market. Since book-to-market is defined as book value divided by market value, a high book-to-market period typically coincides with a period when the market value is low and has decreased. So when stock market values are low and have decreased, the stock market index has decreased. This is precisely what the coefficient tells us. Perhaps, you already expected the significant explanatory power when modeling stock index movements with a variable that depends on the market value, but it turns out that book-to-market is also important when we forecast the stock market. We took a transformation to get at the significant explanatory power for the stock market and this was rather ad hoc. More detailed considerations for transforming variables and related concepts, such as non-linear effects, are treated in Lecture 3.3. Finally, let us get back to the figure of the change of the log index plotted with the book-to-market ratio. After the 1980s, the book-to-market flattens out a bit, and goes to a lower level. It is not clear the relationship between the stock index and book-to-market ratio is stable before the 1980s and or after. In Lecture 3.4, we talk about methods to test whether there's a break in the relationship and also discuss tests that can inform us whether the model is actually good enough. Now, I invite you to make the training exercise, to train yourself with the topics of this lecture. You can find this exercise on the web site. And this concludes our lecture on the motivation for model specification.