Let's formulate the first real computational problem that we'll address on our journey studying the read alignment problem. So it's called the exact matching problem. We'd like to find all the offsets where some pattern string, which we'll call P, occurs within a longer string, the text, which we'll call T. The pattern P is like the needle and the text T is like the haystack that we're looking in. So in this example, we have a pattern P, which is word. And we have a text T, which is there would have been a time for such a word. And there's one occurrence of the pattern all the way there at the end, which is at offset 40. So more specifically, the leftmost character of T that is involved in that match is at offset 40. So we say that the match is at offset 40. And that's the only place where the pattern occurs in this text, in this example. So in a way, this problem is a simplified version of the read alignment problem. It's a bit too simple what as we'll see later. We'll this in lecture and in practical sessions. But it's a great place for us to start. So here's a Python version of the previous example. We have our text T, we assign that to our variable t. And we call a string method called find. And what find will do is, you give it an argument, which is a string, and it'll return the offset of the leftmost occurrence of that argument inside the string. So in this case, the string that we're calling the method on is our text and the argument that we're passing is our pattern. And Python reports that the leftmost occurrence is at offset 40. So if we want to find all the places where the pattern occurs, not just the leftmost place, we have to do something a little more complicated than this, but you get the idea. It's pretty easy to use Python to do this. But let's not use Python's built-in functionality. Let's implement exact matching ourselves. So how do we do it? And let's not try to be clever at first. Let's just try to solve the problem in the most straightforward way possible. So here's a suggestion. Let's try every possible way of lining up P with T. So, for each of these ways that we could possibly line P up with T, we're going to see whether all the characters of P match the corresponding characters of T. And if they do, then that's a match. If one or more of the characters differ between P and T, if there's at least one mismatch, then that offset is not a match. That's a pretty simple idea. We're just trying every possible offset and we're checking whether P matches T at that offset. So here's an implementation of that idea in Python. We'll see it again in a practical session that follows this. This function is called naive, because this algorithm is sometimes called the naive exact matching algorithm. And it takes two arguments, the pattern P and the text T, and we will initialize a list occurrences to be empty. It's an empty list at first. We'll eventually fill it with all of the offsets where P matches T. Now we reach this outer loop here. Which iterates over all possible alignments, all possible ways of lining up the characters of P opposite characters of T. And then, in this inner loop, we have to determine whether the alignment corresponds to a match or not. And this requires that we compare the characters of P to the corresponding characters of T. So the inner loop iterates over characters of P from left to right. And for each one it tests to see whether it's equal to the corresponding character in the text T. If it is not equal, then we know that P does not match T at this offset. So in that case, we set this match variable to false. And we break out of the inner loop. We can break because once we know that one character mismatches, we don't have to do anymore character comparisons. We know that it's not a match. So we can break out of that inner loop, and that brings us here. And only if we make it through all the iterations of the inner loop without ever setting the match variable to false. In other words, only if all the character comparisons were matches, do we go ahead and append this offset to the list of occurrences. And then at the end of the day we return the occurrences list. So that's it. Two nested loops. The outer loop iterates over alignments in left to right order. And the inner loop iterates over character comparisons for a given alignment also in left to right order. So for the next few slides I'm going to ask some questions. These aren't quiz questions, but I do recommend that once you hear the question, you might want to pause the video and try to come up with your own answer to the question before unpausing it. So let's say that we have a pattern and a text. And the pattern has a length x, and the text has length y. So in this case, how many times, in terms of x and y, how many times will we iterate through the outer loop? In other words, how many different alignments will we examine? So the answer is y minus x plus 1. Here's a related question. So given a pattern of length x and a text of length y, what is the greatest total number of character comparisons we could possibly do? What's the greatest number of times that we can possibly go through, iterate through, the inner loop of our nested pair of loops. The answer is y minus x plus 1, all times x. The y minus x plus 1 again represents the number of possible alignments of the pattern to the text. And for each alignment, the most character comparisons we can do is x, the number of characters in P. So we have to take their product, x times y minus x plus 1. This is a kind of worst case analysis. All right, this case seems awfully bad. But when would it actually happen? When would this sort of situation happen? Well the worst case scenario looks like this. This is a scenario where every character of P matches every character of T. So every character comparison results in a match, and therefore we go through the inner loop the maximum number of times. Okay, that seems like a pretty rare case, but it's good to keep in mind that that's the worst case. Here's another question. What's the least number of character comparisons that are possible? So here the answer is y minus x plus one. Which again is the number of possible alignments of P to T. Why is that the answer? Well, for each alignment we're going to do at least one character comparison. But the very first character comparison could be a mismatch. In which case, we immediately break from the inner loop after having done only one character comparison. So if this happens, then we're doing exactly one character comparison per alignment. So the number of character comparisons equals the number of alignments. So when would this happen, and what sort of a scenario would, for what kind of pattern and text would this sort of thing happen? Well this is an example of a case where that would happen. So in this example, the first character of the pattern is a character that doesn't occur anywhere in the text. So the first character of the pattern is a, and yet the text consists of all b's. So in this case, every time we reach the inner loop, the very first character comparison is going to be a mismatch and we'll break out of the inner loop. Okay, so here's a particular example. Here's a P and a T. The question is, how many character comparisons do we do in this example? I'm looking for a number. This is not in terms of x and y, but an actual number of character comparisons that we would do in this example. Well one thing you'll notice is that most of the alignments will mismatch immediately. Right? It's not hard to see because the pattern begins with a w, and yet most of the characters in the text T are not w. So for most of the alignments the very first character comparison will be a mismatch. There are two w's in the text, and so there are two special alignments, and for those two alignments we'll have some matches. And if you add it all up, and I've highlighted in red here all of the places where we mismatch, and in green all of the places where we match. If you add up all of those matches and mismatches, we end up with a total of 46 character comparisons. 40 of them are mismatched, as in 6 of them are matches. So we can also write down the minimum and the maximum possible number of character comparisons given the length of the pattern and the length of the text. The minimum is 41. The maximum is 164. And so one thing that you'll notice about the number of character comparisons that we do is that it's much closer to the minimum than it is to the maximum. All right, so one positive thing that we can say about this algorithm is that in practice, or at least in this example, but it's true for many other examples too, the number of character comparisons that we do is a lot closer to the minimum than it is to the maximum