Next, we will discuss at a high level some of the typical mechanisms for implementing privatization and security controls. A deeper dive into the data privatization mechanisms will be covered in a subsequent section in this training series. As mentioned earlier, there is a separate DataFirst method, data security, focused workshop series that covers the security topic in more detail. For privatization, data privacy mechanisms are the capabilities that can be employed to change or perturb personal data so that a degree of privacy is achieved making the data no longer personally identifiable. These mechanisms also include some tangentially related capabilities such as fabricating data, often used when the risks of personal data leakage or re-identification are determined to be so extremely high and damaging to the data subjects, creating alternative or non-real data for say, use in a testing environment is decided as the best policy. This said, it can be easier said than done and is very much dependent upon the complexity of the data and the data relationships and the use case requirements for the content. But different organizations may use these terms differently or have slightly different interpretations of their meanings. In this training, we'll try and stick to the following definitions and as noted, cover them in more detail in a subsequent section. Data masking is often a general term for deidentifying data but my typical explanation of masking are common straightforward perturbation techniques to either what we refer to as direct identifiers or indirect identifiers, can be either or. These are often techniques that are typically less sophisticated, maybe anything from a simple redaction, simple nullifying of data to the implementation of even more sophisticated techniques such as format preserving approaches, replacing numbers with numbers, letters for letters or even semantic preserving masking which will attempt to recreate values that are not the original values but support some semantic intelligence in the value. An example would be a national identifier. The difference between masking and some of the other mechanisms I'll discuss next are that often we'll see masking not implement as secure or as sophisticated mechanisms, whether that be degrees of encryption or algorithms used or reversibility approaches that may be available. This is not a cut and dry topic. Again, many organizations may refer to masking when they're referring to tokenization and anonymization. But for purposes of this class, masking will be pretty much all purpose straightforward data perturbation approaches versus tokenization or sometimes referred to as pseudonymization, is the process of replacing an original value with a new value, and more often than not, this is done in a repeatable fashion to preserve the entitiness if you will of the value. So as an example, you might have an account number or a national identifier, and in and of itself, that number, that identifier has meaning. When you look at that, you can tell that is Bill or that is Mary's account. Those intrinsic values of that data make it very vulnerable. However, one of the common characteristics of those types of what we typically refer to as direct identifiers is that they do represent some uniqueness or some preservation of an identity that we would like to carry through in some of our other environments even though the data may be not directly re-identifiable, we would still like to know that the data is from a particular entity because we may have multiple values of data from that particular entity and we would like them to align. If I had, for example, a number of transactions for an account and I change the account number on each of those transactions to a different pseudonym, a different token, there's no way for me to tell that they belong to the same entity. One of the challenges with using tokenization is making sure you understand that when I tokenize a value, even though I may be removing the directly identifiableness, if you will, of the value, I'm still preserving some of that unique entitiness of the value. This in combination with other data, often referred to as indirect identifiers, may allow for a re-identification of the original entity. We'll get more into this when we talk about in a deeper dive on the data privacy mechanisms. But for the purposes now, tokenization is the creation of a pseudonym or a token value replacing a direct identifier. Anonymization is typically referred in the industry to more advanced approaches to deidentifying data, and more often than not for what we refer to as indirect identifiers. Indirect identifiers are things like gender or location or disease. Things that may be shared by a variety of individuals and persons. By looking at the element individually, I couldn't tell it belongs to me or to my neighbor. But in combination with other indirect identifiers and even a tokenized identifier, I might be able to piece the puzzles back together and say, yes, that person lives in this location and buys these things all the time. So even though I can't see their national identifier anymore, I know that's Bill because there's only two people living in that location. So anonymization techniques look to try and change or alter the indirect identifiers in such a way that they are either no longer an outlier in the population of data. There are techniques such as k-anonymity and differential privacy that are used to either add noise to the data or make sure to alter the data such that there are always n number of similar kinds of entities or persons and records in the data that have those same characteristics. It reaches a threshold an n level that's agreed to be safe. None of these are perfect techniques for applying privatization but they are mechanisms used in combination that can be used to achieve a privatization approach. We've talked a little bit about data fabrication, minimization, the last item in that list is another technique that is used, and I include it in the privatization of privacy mechanisms not because it's a technique that perturbs data, but it is a holistic technique for reducing the footprint of private data that effectively is reducing the privacy exposure. So an example there would be, having a strategy and a policy that promotes and prescribes the strategy. That would say, I need to look at all the locations of where my personal data exists and make the conscious decision to minimize where that's stored. Such that I only really need to make sure I'm protecting and applying the right security and privatization controls and the data zones and use cases for that central set of data, and any other area that needs that information about the person will be referencing that central location. This is not always an easy thing to change in large sophisticated enterprises that have been maturing and growing their data over many years, but data minimization is a technique that's gaining more strength and in many ways because many of the data privacy regulations are emphasizing that it is one of the techniques that they'll be looking for for compliance with their privacy regulations. There is no right or wrong way to describe these privatization techniques, what's more important is that, in your organization that you align both your technical and business teams responsible for implementing data privacy with a consistent understanding in vernacular, and that the correct mechanisms or combinations of mechanisms are used and employed directly related to the zones and use cases of that data you use. Some final thoughts before we leave the data privacy mechanisms, another area where I would consider mechanisms that need to be implemented or maybe more appropriately capabilities that need to be implemented that are privacy-related, are very much being driven by the increasing regulatory scrutiny around the personal data privacy topic, and have a lot more to do with the areas of consent and purpose and use of the data and tracking and reporting and responding to requests for how that data is being used. So though these again are not mechanisms or techniques for manipulating or perturbing the data, they are capabilities that need to be created within organizations that can do things like respond to requests for how are you using my data or prove to me you are only using my data for the consented purposes that we agreed to, prove to me or show me that you are not selling my data, and I would like you to remove my data, please provide me with evidence that you are doing that. So these are additional mechanisms that I wouldn't necessarily refer them as the traditional privatization mechanisms, but there are other capabilities that you need to be thinking about as you design your data privatization mechanisms strategy.