So we're back, and we're talking about more network analysis methodology. So who is the most influential person in this network. It's Jill, right? Why is Jill the most influential person in this network? Think about it. How would you quantify it? How would you write that definition? From an influencer or marketing perspective, If I could give one free iPad to a person in this network, it would be Jill, right?. The hope is that Jill loves iPads, and she just goes nuts about it. She talks about it to all of her friends, and because she's connected to most people in this network, that word of mouth is going to spread through this network and it's going to get a bunch of people really excited about iPads. Jill is at the center of this network. She communicates and connects with the most people, and through friends of friends, she connects with almost everyone in the entire network. She doesn't know Liz, Emma knows Liz, and Jane knows Liz. Hopefully, she's going to reach Liz through Emma, and then maybe even Allen, or maybe, maybe, maybe even Lisa. Simple networks like this with a few nodes and edges are really easy to visualize. But when we're dealing with networks at scale, you're not going to be able to untangle these relationships in such straightforward ways. Who comes in second place here in terms of connections? Of course, Jill has six, but Emma and Shane are tied for second and fourth. Remember there's no direction to this graph, so everyone must have some type of shared relationship like Facebook friends. Network analyses like this are descriptive. We don't know why Jill is the most important person. We just know that she is. In network analysis, there are ways to address this. If you know things about Jill, like the amount of money that Jill makes or the professional organization that Jill attends, where she meets everyone. Then you can enter these attributes in your nodes in your network analysis, and you can actually begin to account for why these relationships form. Say you know that Jill makes x amount of money as salary, is significant predictor of whether two individuals are connected? Do salaries cause friend clusters? Probably not, but when visualizing a network, you can use attributes as a source of how nodes are clustered or arranged on your graph. Here are some examples of what connections can look like for people in social network analysis. The Twitter mentions concept straightforward. An interaction can be considered a connection and it can even be tallied. If I mention you in a tweet five times, then the value of that connection is five. If I can count the number of phone calls or emails or any other kind of communication that two individuals have, it can be turned into a valued network. It can be prepared and analyzed and visualized. So most social interactions have a direction, right? If I mentioned somebody else in a tweet on Twitter, there's a relationship to that direction. It's a directed relationship. So in this case, if the user IHeartRadio tweets @tacobell, they are the one initiating the connection and tacobell is the one receiving the connection, and that direction can be captured. tacobell is the sought-after node here. They are the ones being talked about the most. It's interesting to see here that there are clusters or group of people that tend to talk about Taco Bell together. It could be groups of friends that just so happen to be talking about whether they like Taco Bell or not, or they could be super fans. That's where the qualitative analysis of network analysis is so important. It's one thing to just create this graph, but what is it of use if we just create the graph? Here, we might be able to use this graph to find pockets of people who talk about Taco Bell in their day to day life and that could be useful. But if we just have a pretty network, it's just a pretty cool JPEG on the screen. So really, the visualization of the network is the beginning, not the end. The inspection of why these clusters exist and what do they represent is actually the end goal of most network analyses. Here's a lovely network analysis that takes attributes of companies and uses those attributes to create a picture of who partners the most with whom. And who is the most central partner in the tech ecosystem in the United States. The nodes here are Fortune 500 tech companies. The connections or edges represent the partners or the connections that those companies have through partnerships. This graph illustrates who partners with who. The more partnerships that the companies have with each other, the closer they are. The further they are away, the less partnerships they have. IBM, HP, Microsoft, they all partner together a lot and they're also most partnered with. So they get put in the center of this graph. Tableau is an outsider, they're not partnering as much, so they get pushed towards the edge of the graph. This gives you a feel for what companies do when they work together. They're not just partnering, but it actually gives you a feel for what type of partnership they have. You can see here that the color of the edge actually represents the partnership that they have, but nodes are also attributed by color. So each company is also described by the type of service that they provide. So not only do we understand what types of partnerships are most common in this network analysis, we also learn what the actual actors or nodes in the network are actually doing by the color shading of their company. So this is a lovely network analysis that uses attributes to bring further explanation to connections to dataset beyond just edges and nodes. So this is text marketing, right? What the heck are we doing talking about people? Well, I thought if I was going to teach you about network analysis, I would be remiss not to talk about the bread and butter of network analysis, which is the connections that people or things or places share with each other. However, semantic network analysis is a powerful tool to understand the meaning of words. It's a tool that we can use to make sense of the connections that words have together. Semantic networks are about collecting and emphasizing the relationships of the words here. If you read a sentence, you don't process it word by word, you take the whole thing at once, then your brain processes that down to a chunk at a time. This is why we can easily grasp the meaning of sentences like the dog bit the man. Even though that's not grammatically correct, we know that man is the object of bit not a subject. A semantic network takes this idea to a higher level by showing us how words relate to each other within larger chunks of text. Semantic network analysis looks at the co-occurrences of words in a unit of analysis, such as a sentence or a document. And each word is represented here as a node, like a term in a document matrix. If we consider a semantic network that is created at the sentence level, we're essentially creating a connection between two words anytime two words are mentioned together in the same sentence. Inside of a given unit of analysis here with the sentence, we're trying to figure out what words appear together the most. Remember, when we did topic modeling, we were able to take a sentence and filter it down into key features via tokenization and lemmatization, and so on and so forth. We did preprocessing here, and we also extracted out key features in this example. The sentence, I like the hamburger. The fries were really good too, very expensive, though. After preprocessing, comes down to hamburger, fries, expensive. The next document, the fries are the way to go, but they're really good with the pale ale. And trust me, it's a Friday afternoon and that sounds great right now. We end up preprocessing that sentence down to fries and pale ale. If we wanted to visualize these as a semantic network, we have one document that mentions these three things. Therefore, all three of those things are connected with a tie strength of one. Then we have another document that mentions two of those things, fries, and then a new thing, pale ale. So we're able to make that connection there. If we start to add up all the connections, we're going to end up beginning to have a semantic network of the survey responses for this hamburger joint. These relationships aren't exposed until we intentionally record them. They're in the sentence, they're in the documents, we have to pull them out so that we can visualize them. Let's continue to add additional reviews. So another review comes in, the hamburgers are better than the flies. The selected terms here are hamburgers and fries. So we've added the connection between the two, and now hamburgers and fries has a weight or a value of 2. We can keep doing this on and on until infinity. Eventually, the network will contain the keywords and the key relationships that describes how people describe us as a company. So I've published a few articles on network analysis, and they are in your supplementary resource folder. You can check them out, you can see what I did with those network analyses. Here's an example of one that I did in 2012. I looked at the mentions of political candidates and the associations that those candidates had with political issues. My idea here was that certain politicians would be attached to certain issues more than others. And those relationships would uncover a broader semantic way in which people relate to politicians. So this is first post here. This first tweet was about President Obama. Foreign policy was the keyword that I wanted to capture because it was the connection to a specific type of policy. The next post is also about Obama but this connection was to healthcare. So we build a semantic network and these two connections when we added to the network. If we do this for millions of tweets, we can begin to understand how people relate a political candidate to issues and how those issues tend to cluster together. So what ends up being most central? President Obama was most associated with foreign policy or foreign affairs. This graph also shows how issues were mentioned together with each other. When talking about Obama and healthcare, people also tended to mention the budget. For Obama, healthcare and budget were two issues that were mentioned together both semantically. Similarly, when talking about Obama, mentions of taxes co-occur with mentions of education and immigration. So the tax issue here is what we would call a bridge. It bridges the relationship that people have with Obama. So when people think of Obama and his policies and initiatives, they tend to associate taxes along with it. And that makes sense if you remember the 2012 election, President Obama was regarded as having very socialistic policies. And one of the common criticisms of that was that these things would cost more. Not making a value statement here, just actually describing what happened in the data. And so when people thought of President Obama and political issues, taxes kind of bridge the way in which people thought about a lot of his policies. So when people mention Obama, they most mentioned healthcare, and that is something that's a really good descriptive statistic in and of itself, right? If we notice that in 2012, Obama's number one issue that he campaigned on was healthcare. It's really reassuring as a politician to know that that tie or that relationship is most strong for those two connections. That really it says that, hey, all of this campaigning that we did to say that healthcare is important, has gotten through to the associations of how people talk about Obama. So this data comes from a series of interviews asking people, what do you look for in a Tinder profile? This is a really cool paper, and you can google Tinder network analysis to get to it. It's not my work. This semantic network analysis that we're looking at, looks at how words were used together in the same interview. Responses usually mentioned the word attractive but also tall and funny. Even though they're very different things, there's a strong connection between attractive and funny. There's a link here. There are some other notables. A Tinder picture is really important. When people mention attractive, they don't actually often mention picture, which is interesting. In manually reviewing a network's ties, were going to become more familiar with the commonalities the semantic relationships words have with each other. It's far better to visualize something like this than the word cloud. Because we capture the semantic relationships that words have with each other, we also capture the amount of times that the term was used in an interview. The bigger the node, the more times the word was used. So instead of just making words big on a page when they're mentioned a lot, or small when they're not. We can actually do that with the node and still keep on the same picture the relationships that those words have with each other. So I prefer this toward clouds, 10 to 1. The darker the line here, the more times these words were mentioned in the same interview. This is a valued network where the total number of times two words were mentioned in an interview were simply tallied up. The bigger the node, the more times that that specific word was mentioned. Many interviews mentioned just the word attractive. Therefore, the node for attractive is big. Lots of interviews mentioned the word picture. There are a lot of interviews that mentioned the word nice and small together, interestingly enough. This is an undirected, valued network. There's no clear direction to the co-occurrence of words, and often in semantic network analysis, you could suggest that if one word comes first in a sentence or in a document, that it is sending a tie to another. But most semantic network analyses are left undirected. So this is another network analysis that I performed for an academic journal article. I ingested 3600 different journal PDFs and I extracted out 101,000 citations from those papers, and I wanted to see the relationship that citations tended to have with each other. The idea was that if we ingested enough research, we can actually begin to understand, one, what papers were most influential in the field, but two, how those influential papers tended to cluster together. A network analysis of citation data allows us to see what articles clustered together. I manually read the top articles and just labeled them for what type of article was it. And eventually, I was able to come up with a list of four or five different paper types based on the abstract and title of the paper. Articles in blue represent network analysis papers, how meta is it to do a network analysis of network analysis papers, and then talk about it in a network analysis course. Ones in purple represents statistic papers like the ones we read in topic modeling when we looked at model fit statistics. There are other ones like theoretical papers and so on. The bigger the node, the more times that paper was cited. The closer or more central on the nodes, the more that that paper tended to be cited with other papers. We unlock who's citing who and what through the natural clusters that network analysis gives us when we visualize it.