So let's focus in a little bit on the read alignment problem. So we have sequencing reads, which are like puzzle pieces, and we would like to put them back together with the help of a reference genome, which is like the picture of the completed puzzle. So, in doing so, we're taking advantage of the fact that different individuals of the same species, like say, two different unrelated humans, have genome sequences that are very similar. So what we're really doing here, is we're repeatedly going through a process where we have a sequencing read, shown here at the top, and we have a reference genome, which might be, say, the entire human reference genome. And we're looking for the place in the reference genome where the read sequence matches most closely. We're doing this repeatedly, once for every single read in our data set. And by the way, a second generation sequencer outputs on the order of billions of sequencing reads per sequencing run. So we're going to do this maybe billions of times for a data set that comes from a DNA sequence. And furthermore, this reference sequence could be very, very long. So for example, the human reference genome is about three billion bases long, three billion. And so I think in this figure on the slide, I could only fit in about 2,000 or so characters. So you have to imagine that the human genome is more than a million times longer than what I'm showing you here. Here's another illustration of how long the human genome sequence is. At the end of the Human Genome Project, a printed and bound version of the human genome sequence was actually printed out and created. It's on exhibit at a place called the Welcome Collection in London. And if you look up close at what one page of one of these books looks like, it's pretty small type, and it's all a's, c's, g's, and t's on every single page. So this is another way of thinking about how long the human genome sequence is. And these numbers that you see on the sides of the books in this picture, they correspond to the different chromosomes of the human genome. So you've heard the expression when you're trying to find something and things seem hopeless, you might say, it's like looking for a needle in a haystack. Well, here we might be looking for billions of needles and all in a very, very large haystack. But the good news is like we said previously, we can think of the sequencing read and we can think of the reference genome. We can think of them both as strings. And this is a profound and fortunate thing, because in computer science, people have thought very hard and for a very long time about how to work efficiently with strings. Many algorithms and data structures have been invented for the purpose of making it easy to search for patterns inside strings, to index strings, to compress strings, to do all kinds of different things with strings. And in fact, it's still an active area of research. The fact that it's an active area of research is, in part, because of these emerging problems in genomics, like the read alignment problem, which are taxing to even the most modern string analysis methods. So this, in brief, is why the read alignment problem is a challenging one, and we're going to study it from several angles in the coming lectures.