Good morning, Bonnie. Thank you for joining us for this talk about your role in giving data to researchers. So, why don't you introduce yourself and tell us what you do? Sure. I'm Bonnie Woods. I'm Senior Manager for the Core for Clinical Research Data Acquisition, better known as the CCDA. So, what is your responsibility at CCDA? I manage a team of software engineers who are trained to access clinical data from our electronic medical record systems for IRB-approved research. So, it's a lot. So, that's maybe chop some piece of detail. So, first of all, from your perspective, what's research? Research from our perspective is solving a question about the patients that exist in the electronic medical record systems. So, we often perform retrospective look backs of patients to determine if maybe a clinical variable like a medication can cause a condition or where we do prospective studies where patients are recruited for either observational studies or clinical trials. We know they're like 10 analytic teams at Hopkins doing all sorts of looking at the electronic records. So, this research where you will print to answer questions whose quality you're trying to improve care. What are the other teams? Finance systems who really deal with claims data to answer some more questions either quality or research in some cases. There's the population health group. They are answering larger questions about other populations outside of Hopkins. They are the hospital operations team who are more concerned with operational day-to-day work. So, each one of these teams has a purpose. Well, in my line which above all have purpose. But they all have a common task of getting to data and getting the data to answer one of those questions. Right. So, we talked about the research part. You mentioned something about the IRB. So, what's the IRB? The Institutional Review Board is the Ethics Committee who reviews protocols to ensure that they are compliant for human subjects research. So, in essence we just come to the high levels parts of the stack. You deliver data, you deliver research to answer questions, and you deliver security and compliance with the IRB. Maybe we're ready for a story now. But how this actually works. So, give me a story of the requests that people has come to? Sure. Recently we received a request to identify patients having a Chordoma, which is a cancerous tumor that metastasizes quickly and usually manifests itself in the spinal column. Starts in the spinal cord. It spreads from there. Yes. So, the the main purpose of the study was to identify patients having this condition so that we can look back over time to see what their outcomes were. So, my focus as the CCDA analyst is to try to identify the patients having the tumor, and you can either do that through an ICD coded diagnosis or another term. But that would be the normal place to start. So, before we get into the details, you remember how they verbalize the question to you in the first place? Do you remember what they asked you? Yes. What did they ask? They say I need to find all patients having a Chordoma from 2003 at Johns Hopkins Hospital. So, what goes through your mind when you hear that type request? My undergrad is in journalism, so I'm used to asking questions like who? What? When? Where? Why? How? Is there a how? Well, there is, but it's more related to the why. Okay. In this case. I see. So, what goes through my head is really, who are the patients in this case? Are they adults? Meaning, 18 or over? Or are they children, which in the IRB terms is a vulnerable population. So, you do have to ensure that if they are approved for patients who are under the age of 18 that the IRB approved it. So, it's interesting, you did not say that there is a definition of child. You've linked the definition of child to your mission which is to be compliant with the IRB. Yes. This will call that because not everybody would jump on that. That's true. I think that takes researchers aback sometimes. Right. But they are also used to submitting their B protocols. Maybe the quality teams wouldn't ask these questions as much. Okay. I'm going to sit in a link between what you want to accomplish and how you accomplish it, and that we will always get a request. Sure. So, what would be, what is their condition? Or what makes them qualified for the stud? So, in this case they wanted to know who had a Chordoma. But I would question, how do you know, how they had a Chordoma? So, we can get into the ICD coded diagnoses or their procedures at that point. So, we can talk more about that. The where. Where were they seen? So, that does matter for IRB compliance purposes too. It also matters because of the history of Johns Hopkins electronic medical records systems. How does it matter from an IRB perspective? To study location is one of the questions asked and approved. So, if I am the investigator and they ask me, "Are you're going to get patients from outside Johns Hopkins Hospital?" I say, "No." Then the fact that come and I want patients from other hospitals in the house up and system and say, "I'm sorry, the IRB does approve that." Right. Will that going to give you narrow dataset or you have to go and get your IRB approved? Hopkins because it's acquired community hospitals over time or the internal care clinics. They have ancillary review committee for each one of those locations. That's why it's important. Sure. But it also helps us because from the electronic medical record system perspective, there are different sources of data depending on where. Because there's no one, there is not the EHR because we had several over, especially since 2003. Right. So, this is the archaeology part of your work. It is. Where the hype is everything. The data detective part. The data detective. Right. So, also when is important for the same reasons we just talked about IRB as well as data source. Then there's the why, which is the most fascinating question. How do you include patients having a Chordoma for example? I think you are saying the why is why they are doing the research, and that leads to this notion we have got to yet and whether you're going to be very inclusive or very restricted. Right. One of the conversations with the faculty member, how do you communicate the concept of broad versus narrow search or, how is that? I guess for the backup when it was fit to researchers. We first ask them to discuss their protocol and detail in layperson's terms. So, tell me about your research project. What are you hoping to accomplish? What type of data do you think you need for the study? Are you a clinician? That also helps because if they are they usually are going to use Epic or another system. They are going to understand how that data is captured and maybe they enter that data. Then I would like to ask them, what will you do with the information you provided or we provide you? So, if it's retrospective, will they look back over time and try to do analysis of outcomes? If its perspective, they most likely would like to recruit patients. Right. So, with retrospective we can be more sensitive because. You mean sensitive in the technical term of sensitivity specificity. You mean, broader, which we'll get to that. I just want to highlight. You're doing a classic informatic seq analysis. You've asked him, what do you want to accomplish? Which is what I'm trying to get, always get everybody to talk about. That's perfect, we're getting back to that. So, sensitive means a broad search and specific this narrow search? Right. So, do you use those words? What words you use to them? I usually don't. I usually use broad versus narrow. So, do you want to conduct a chart review perhaps to weed out additional analyst. Is that broad or narrow? That is abroad. So, if we give you a list of patients having a Chordoma, well maybe you don't want a certain type of patient who was diagnosed with a Chordoma. Maybe you're not looking for patients who also have a Comorbidity. Rather than us working really hard to identify every possible Comorbidity that patient can have, we can give you a list of the Chordoma's and you can weed them out yourself. That's a broad search. However, if you're doing prospective study especially for recruitment, you really have to be careful about recruiting patients who don't meet your criteria. You will end up angering the patients. You will end up not having a good sample size or a good population to work with. We really care about patients and a patient does not want to get a phone call here, and here you have some horrible disease included in our studyings. Why do you talk about, not like this. Exactly. So, that's a [inaudible] Other pieces to the query will be then, well, if you're going by diagnosis code, do you really want to have only one instance of that diagnosis recorded on their diagnosis list, or do you want to also combine it with maybe a lab result? There are getting [inaudible] How did you know they are Chordoma? Exactly. So, the notion of broad and narrow is specific is first of all, what disease they have? Were there any disease process? Are they at risk for the disease? Are they early in the disease? Are there complications in the disease? Latent disease. So, these all that to think about. In other cases, you have this notion that we're in the stage of the disease. Do you want to catch the patients? We do. Unfortunately that's not well captured in our medical records system unless it's trapped in a clinical notes or surgical note or pathology report. Then we can assess with natural language processing. It's just a little more difficult. Won't AI solve this? Yes. No. Nobody is going to solve it. No, it's not going to solve the problem, but it will certainly make it easier to identify the patients of interests. Everybody would see here that other guys is going to solve everything. So, I always have to say. No. But anyway. All right. So again, to the point you'd have the who, what, where, why, when, you leave anything else? No. I mean, why is really the most important question of all because sometimes we talk to researchers and we give them data, and afterward it doesn't solve their questions. So, it's up to us, during our triage process to understand their question. I'd like to do an iterative discussion. I'm assuming that sometimes you get an understanding, go back to the programmers, then you realize, wait a minute, you still haven't formulated specifically enough. Right. Can you give me an example there? Well, usually they think of questions that I didn't think of during the intake process. Can you give me an example type of questions that they might come up with. So, maybe for diagnosis. I'll say, "Do you want diagnoses from the problem last versus the encounters?" Then, I'll get the answer and I'll give it to the programmer to start to work. Then, I'll come up with a question like, well, what about medical history, or what about billing codes? Knowing our youths, we have 10 different places where there can be a diagnosis. Right. I don't think we're consistent exactly at which fields mean what level of severity, or what level of certainty, and we can figure it out for each diagnosis. So, you understood the problem, and you've made your assessment of what it is, you bring it back to the programmers. In this particular case with the chordoma how much back-and-forth will save you at all? A lot, honestly because sometimes it helps to have patient examples. So, give me an example of a patient who meets your criteria, and then we have answers to the charts, as well, since we're part of a data trust analytic team. You have this notion of an honest broker that you've mentioned to me before. Say something about that role of an honest broker. So, an honest broker is a third party, basically, who can provide data, but also to review the IRB to ensure that what is requested is approved by the IRB. If there are any discrepancies, then that group can bring them to the attention of either the IRB or the research team. So, examples include maybe they wanted to have children included in their study, and the IRB only approved adults. So, I can provide data for adults but I won't be able to provide data for children, unless the a research team submits a change in research to the IRB and justifies why they need children. Maybe it's because when during the analysis, we determined that many of these conditions are manifested in children instead. It's a vulnerable population so it doesn't need to be expressly included in the IRB protocol. So, it's the broker part of it. What's the honest part of it? Well, the other piece of that is data security and protection of protected health information or PHI. But I think the point is that the staff are trusted. Yes. Because they get to look at everything. Right. So, they have to be trusted that they're not going to abuse that trust and their ability to look at everything as they weed out what data is able to go outside the IRB firewall if you will. Inside the little firewall [inaudible] The data trust analytic teams in general are considered trusted entities. The honest broker is a little higher level because that person has access to the IRB protocols, and direct access to the IRB review committees. Though we know not everyone in our team is an honest broker, we also have skills in data management, so we know that, me in particular, I know where the safe destinations for data are. We don't want to just give a file of PHI to researcher. All right. So now, after you view all these, do you finally figured out what the query is, you get the query you, get to the DNA, you've got to put it somewhere. Right. So, it's part of that. We talked about delivering data and security, and IRB compliance. So, the security is also where you put it. So, I know we have an environment at Hopkins which is, how many years was that? Six years at work you're putting it together where you deliver the data that anybody using it at that environment is following your policies as opposed to the Wild West where we give a flat file that this is a spreadsheet. So, the laptops get stolen, everybody's upset. Right. Which is not a good thing. Do you remember how many hours this query took? Well, writing the query is pretty simple. Right. It takes 2 to 3 hours at the most to write test, chart review which do use cases or test cases for patients. The back-and-forth analysis sometimes takes just as long. In particular in this case, because the researcher didn't really understand how to identify these patients. So, sometimes we have clinicians who are very aware of their population because they treat patients. Others want to look at the generalized population and they may not know the characteristics. So, it really depends. But in this case, it was a few hours of analysis. A simple question that, when all patients for a retrospective study from 2003- Right. -I think that's pretty clear. I told you the diseases. I told you when, I told you where. You're still going to spend six hours? Right. That's the interesting part of this job. So, you also have the job of educating the client. You say, "Guys, it's going to take a little bit longer than you thought." Right. So, that's great. So, anything else you want to add about this task, about this role of interceding between the data and the researcher? It's really helpful to demonstrate feasibility of queries like this. So, if you can provide a query that maybe that doesn't have PHI at first, it's just the account or breakdown of counts, that's very helpful. For in this case, from 2003 to the present, what is the count per year? Maybe even by specific diagnosis code of the patients in your inclusion criteria. That is very helpful in determining feasibility. So, maybe you look at the timeline and say from 2003 to 2013 there were very few patients. Now, there are more towards 2013. Perhaps, that should be your time frame instead. So, data-driven decisions. All right. It also reminds us that the research, just as we said that patients go through a sequence of an illness, the research is going through a sequence of results. The feasibility is going to do that research at all, then I'm going to apply for refunding. Then I'm going to recruit the patients. Each one of those needs you in a little bit of a different ways. They can freeze accounts, then the second is the broad search because you're going to tell the funder we have tons of patients. But then when you go to actually include patients, you want to be really careful. You could have a relationship with these clients such as give me data and here you go, but they have to keep track of all that sort of stuff. Right. So, I know you'd like the phrase data detective. So, you're a journalist, you're a detective, and then sometimes you're even a technical person. Sometimes, yes. I like to dig in. Roll my sleeves and dig in sometimes. All right. Well, thank you very much. Thank you.