Hello and welcome to another SQL walkthrough. In this one, we're going to go through a bit of the material on how to handle text. So here we go, there's some some things that are kind of cool. So, sometimes you want to generate a bunch of data because you're going to do some queries, you're going to do some explains or explain analyzes. And you want to fill up a table with a bunch of stuff and you don't want to type a bunch in. So let's take a look at some of the sort of building blocks of doing this. First, there's the number random, so if we do SELECT random random. Random is a number between zero and one that's random, if you call it more than once you're going to get a different number. So if I do this over and over, different number, different number, different number. And if you want to sort of get a number between 0 and 100, you do a trunc of the random times 100. So the floating point number is multiplied by 100, away you go, so that's one thing. You can also sort of concatenate a string together to make long strings, if you want. So if you want to make some gibberish, this generate_series is really awesome. because what it does is it looks like it's just one thing but sort of every time it hits it, it forces the generation of a new row. So this generate_series is like five rows with the number 1 through 5 in it. And then we can concatenate that, SELECT and so this double vertical bar is a string concatenation. This is one field, right? Except that this is going to generate five rows, okay? So we take a look at this guy, it generates five rows and each row is concatenating the word Neon to 1, 2, 3, 4, 5. And if you think about this from a Python perspective, this is kind of like a list comprehension where it's making a five-element list, Neon1, Neon2, etc. And so we can fill some things in, right? Let's make a little table here called textfun and we can use this. So this SELECT Neon, that generates five rows, but we can also just say, INSERT INTO textfun. Instead of saying values, we can say, SELECT. Now the key to it is there's got to be a column here. And there's got to be the same number of results of that SELECT at the end as there is in the INSERT column list. And we have INSERT INTO textfun content, and then a SELECT that comes right after that. And I can say then SELECT star FROM textfun, or I can fix my typo because I type too fast. So that's how it works, right? So again, that's inserting multiple rows, the SELECT is sort of driving it but the generate_series is what's driving the multiple SELECT. So, let's throw that little bit away. Let's play a little bit with some larger bits of data, let's make ourselves, Let's add an index, I named my index CREATE INDEX on text. textfun_b is the name of the index, textfun is the name of the table. And then content is the column, This shows that you can sort of create an index of your choice after the fact. And so it's like how ALTER TABLE works on a running system. Now you can ask, how much data is allocated on disk for the relation itself, which is basically the rows, and then how much is in the all of the indexes? And right now we have 8K, but they're empty so we can fill it up. Now, this is another one of those things that's going to generate a bunch of rows. You'll see that it's concatenated with generate_series 1000 through 1005. And this is just a CASE statement, it's kind of like an if-then-else, except that it's not like choosing a code path, it's actually generating one of two strings. So the random, remember every time you hit, it gives you another number, if it's less than 0.5, i.e., less than 50% of the time. So CASE WHEN, this is kind of like the if clause, then the word THEN. And then this is a string, and then ELSE, that's a string. And so what this does, this part right here, in the parentheses, is it either returns one of those two strings 50% of the time. And then concatenates that. So if I just do this part right here. I'm going to put a semicolon after it, it'll give me that one, it will give me that one. It will give me LEMONS. So SELECT CASE WHEN random less than 0.5 THEN. So this is returning a string but it's flip-flopping based on a random number. So the CASE is not exactly an if-then-else, it's sort of a more of a WHEN because it's evaluated over and over and over again, each time there is a row. But then when we concatenate that with generate_series from 1000 through 1005, then we get five rows and the random concatenation, LEMONS, neon, LEMONS, LEMONS, neon and it's driven five rows in the generate_series. So you can think of it as every time that CASE bit is hit, it picks randomly between those two strings. And then it does it five times with that number going from 1000 to 1005. So we can use this sort of mechanism and we can fill up some data. We're going to do an INSERT INTO textfun and then we're going to have those rows. Except now we're going to go and do a 100000. Now, I would say watch out when you're doing this, because it took a little bit of time because we're going across the network actually. That's kind of impressive that there's a database on the far end that can insert those rows and make indexes, etc., etc., etc. So that's been inserted and so I can ask how big is the relation size and the relation size is this number here, which is about six meg. The relation size correlates very directly to the rows, the index's size is pretty big. It can be bigger than it. In this particular case, it's because we're only indexing one column. And so the index's size is pretty big. But you can see how the index grows as we insert data and how the relation grows. So let's take a look at some string operations. So the part that's the most standard across databases is the LIKE operation and it has a wildcard that is a percent sign. So that basically says is go through here, find the content that has this number somewhere in it. So the percent is like any number of characters followed by any number of characters. And so that's like anywhere in the string. It happens to find it here at the end. We can ask for that material at the, I mean, oops, that didn't come out right. Oh, you can take the uppercase of that, sorry. I was missing what the difference was, that's the uppercase. So that does the upper, you can do lowercase, you can grab the rightmost four characters. You can grab the leftmost four characters. You can call strpos. And you can say where within the string am I looking at. Now that says, find the position in the string WHERE content LIKE under 150000. So that's going to show us that it's in position 2 because it would be in ttp, ttps colon, where is it? Oh, that's a 0 because it's http, not https at all, it's actually http, so it didn't find it. I take the s off, and it worked just fine, so it shows us in position 2. All right, sometimes it's http and sometimes it's s, I should probably fix that. So in the future when you're doing this, it'll just work a lot better. Okay, so we got a couple other functions we can play with. The split_part is like a lot of languages that allow you to do splitting. You give it how many pieces you want and which piece you want. So that's basically breaking it into pieces. Let's just say SELECT. It splits it based on slashes and gives us the fourth one, which is one, two, three, four, so it's the fourth one. So it gives us LEMONS. So you can use that just like a split in normal programming language. This translate is kind of weird in that it's going to do a one-to-one mapping. The lower case t is going to map to uppercase T, the lowercase h to uppercase H, the dot becomes exclamation, etc. So this has got five characters and this is just going to transform five characters. Why you want to do this? Who knows? You'll probably not do something nearly as silly as that. So we see the HTTPs are all uppercase, the slashes are underscores. And so, you can kind of see. That's just a tool that you may or may not need to use. So, another kind of wildcard, the percents are our wildcards that match many characters. And in this LIKE style you can, oops, there is another wildcard which matches a single character and that's an underscore. So these are not regular expressions, they're not command line things. They're not those kind of things. They are just that's a single character. So it's 150 followed by some character followed by two zeros. And so that's how we get these things right here. You can also say you can give a list of things with an IN, WHERE content IN and then a parenthesized list. So that, okay, is that because we don't have s's. Yeah, I'm sure I mismatched my s's and p's and the neons, etc. got my s's backwards, but that I'll fix that later. I think this should be s, and you can find it where it matches. Yeah, I still didn't find it, that's because the random numbers didn't work out in my favor this time, so. So there you go. Now when we're all done with this, let's just drop that table and so we don't fill our server up, okay?