Welcome back to the, the third module in this fourth unit on molecular genetics. And, for some of you, I know, I'm introducing a lot of new terms. You should go to, to the course website. We'll put various learning aids there to facilitate if this material is new to you or seems rather foreign. Today I'm going to talk about, actually, what's one of my favorite topics to talk about, but I'll try to control myself. The Human Genome Project is an extraordinary project. It, it really has changed human biology, and it's changing human psychology as well. And, I'm going to talk about it in terms of, there are many surprising things that came out of the human genome project, but for me, three things are most salient in being surprising. So, what is the Human Genome Project? You probably all know a lot about the Human Genome Project, but just a little introduction here. It was an international collaboration that was begun in 1990. Countries from all across the world collaborated in sequencing the hu, a human genome. The goal that they set out for themselves in 1990 was to sequence one human genome. 3.2, there, we now know that there are approximately 3.2 billion bases, or actually base pairs, because there's two complementary strands of DNA, but you don't need to sequence both strands, right? Because of the constrained pairings across the, the strands, if you can, if you sequence one strand, you've got the other strand. So, there are 3.2 billion bases, or base pairs, in the human genome. And they, they plan this as a 15 year project. In fact, they declared the project completed on April 25th, 2003, two years before the scheduled end of the Human Genome Project. And, it turns out, not coincidentally, because they did this on purpose in declaring that they had succeeded. April 25th, 2003 was the 50th anniversary of James Watson and Francis Crick's publication in Nature, the journal Nature, of the double-helical form of DNA. So, they, they, they announced the finishing of the sequencing of the human genome on the 50th anniversary of that very important publication that actually ushered in the molecular genetic era. Now, sorry, a little bit of more terminology here, but maybe this is helpful, a little bit about size, right? We're going to talk about 3.2 billion bases of DNA. The scope of the size here is pretty massive. geneticists, call a thousand bases, which they abbreviate that way as a kilo-base, of DNA. A million bases of DNA, they'll call a mega-base, and abbreviate that way, and a billion bases of DNA is a giga-base of DNA, and abbreviated that way. So, what did sequencing the genome involve? Well, basically, working out those four bases. And, this goes on forever, pages and pages, right? 3.2 billion bases, here. This was a single genome. It's actually a composite of multiple individuals, but one genome. But, this would go on for thousands and thousands and thousands of pages, like this. But, the first thing they did is just sequence this out. They then enumerated them, they counted the bases, and they placed the bases on the various chromosomes. In fact, different labs sequence different chromosomes. And so that, here, we're counting the bases, here. And, they're broken up by, maybe there are 10 bases each. And then, they had to identify the various gene boundaries. Along the way, three big surprises. The first big surprise when we sequenced, or when they sequenced, the gene, the human genome, was that most of the genome is not involved in what we thought the primary, or what certainly is the primary, function of DNA. That is, to be translated into protein. This is a little pie diagram. You don't have to worry about all the different categories here. I just want to highlight one. And that is, about 1.5% of those 3.2 billion bases of DNA is actually involved in coding for protein. The other 98% of our genomes isn't involved, directly, in what we think DNA is there for. So, what is that DNA, there? Why do we carry it? [SOUND]. At one time it was thought that that 98% of, of, of the bases really had no function, that it was quote-unquote, junk DNA. We now know that a lot of it, we're not sure exactly how much, but a lot of it, does have some function, some very important function, in regulate, regulating the protein coding segment of the DNA. We've already begun, and we're going to talk more during this unit about genetic regulation, but we've already begun to introduce the concept that there are regions of the genome that, even though they're not coding for DNA, are, or, I'm sorry, coding for protein, regulate that process. The promoter region, which is on the upstream, or the five prime end of a gene. There are other regions that could actually be both on the five prime or the three prime end of a gene. Or, actually, very remote from the gene itself, that actually could have regulatory function on how this gene is tr, transcribed or how it's translated. So, even though a very small portion of the genome is involved in directly coding for the protein, the other portion of the genome, probably a good segment of that, is functional in some way in regula, regulating the expression of those protein-coding genes. First surprise. The second surprise is that the ge, the, humans have a very small, relatively small number of protein-coding genes. Originally, back in the 1980s, we thought that the human genome probably had about a 100,000 protein-coding genes. Today, the number is on the order of about 21,000 protein-coding genes in the human genome. We have 46 chromosomes, 3.2 billion are giga-bases of DNA, 21 thousand genes. Now, what I've displayed here are other species, the number of chromosomes, their, their genome size, and the number of protein-coding genes. We have about the same number of genes as a chimpanzee, fewer genes than rice, and not many more genes than a roundworm or the, the, the little fruit fly that disturbs your picnics in the summer. That's really kind of surprising. You, you gotta believe that humans, at least I believe, humans are biologically much more complicated than rice, but we have about half as many genes as rice. [SOUND]. One of the things that we're beginning to realize is that the biological complexity of our genomes is not a simple function of the number of protein-coding genes. Those other, that other 98% of the genome that's somehow regulating that is very important in creating the biologically complex organisms we are. Because of this, the whole concept of what a gene is has evolved over the last 20 years. Now, I don't mean for you to actually memorize the definition of the gene, which I'm going to give you here on the next slide. I just want to try to illustrate a point, is that, our notion of what the, our genomes do is really changing as we're getting more information about what our genomes are. The original, the classic definition of a gene was that a gene was a sequence of DNA that coded for a protein, a functional polypeptide, a sequence of amino acids. That was the traditional definition. But now, we know there's whole regions of the genome that don't code for proteins directly, but have a function. So our, our definition or the geneticist's definition of a gene has expanded to incorporate those. So, this is a definition that actually, it took 25 geneticists, locked up in a room, two days to come up with an alternative definition to expand our notion of what a gene is. Essentially, what the definition here is saying is that what a gene is, is any functional sequence in the DNA that somehow has function on our phenotype. So, there's much more to our genetic code than that part of the code that is directly involved and translated into protein. That's the second surprise. The last surprise is actually getting back to comparative genomics, comparing genomes of different species. When you line up your genome with the genome of this chimpanzee, what you find is your DNA bases, that is, those four bases, those four nucleotide bases G, C, T, A, they line up to be 98% concordant with his or her DNA. I don't know if it's a male or female chimpanzee. 98% of your DNA is the same as that chimpanzee's DNA. Well, maybe for some of you, that's kind of unsettling, that your DNA is so similar to the chimpanzee's DNA. I don't want you to feel too bad about that, so let me try to help you. Oh! Turns out that your and I, my, DNA lines up 99.9% the same. You and me, or as Craig Ventner, one of the actual, one of the leaders of the sequencing of the human genome, you and me are basically, right? If we have 99.9% of our DNA the same, that is, those bases. You look at your DNA, you compare it to my DNA, 99.9% of our DNA is exactly the same. We're basically monozygotic twins, right? We have the same DNA. Well, are we? Remember the scale here. How many bases of DNA is in our genome? 3.2 giga-bases, 3.2 billion bases. So, yes, 99.9% of my DNA is the same as your DNA across those 3.2 billion bases. But, what's 0.1% of 3.2 giga-bases? Well, 3.2% or, I'm sorry, 0.1% of 3.2 gig, giga-bases is 3.2 million differences per strand, and there's two strands, so even though we're 99.9% the same, we're going to differ at about 6 million locations in our genome if we count both strands. And, that's actually very typical. If you take any two individuals, they'll differ in about six million bases of DNA. So yes, we're almost genetically identical, but there are many, many differences between us. And indeed, many, many differences between us and the chimpanzee. And, I want to highlight, I don't think we want to get mislead by the, it, it's important that we're, we're very sim, similar genetically, but, right? This is a course about the genetic differences among us, so I don't want us to get mislead by that 99.9%. So, let me give you an illustration. I know this is impossible to read, but I'm trying to make a point. This is the bases of DNA that are involved in the gene that's mutated in people who suffer from cystic fibrosis disorder. And in fact, this gene is a long gene. The number of bases goes on for pages and pages. There are actually 250,000 bases here. Many, many bases of DNA. And, most of us have the same bases in this gene. But, people who have cystic fibrosis, again, this would go on for many, many pages, so I'm only giving you one page of this particular gene, and you can't even see the base the bases because I've made them so small. People who suffer from cystic fibrosis have a mutation right here, many of them, have this little area mutated in all these bases of DNA. So, yeah, most of our DNA is the same as, as the DNA of a person who suffers from cystic fibrosis, but very small changes in our DNA can have rather significant effects on our phenotypes. Next time, we'll talk about that 0.1% of our DNA that differs among us. The genetic variance among us.