Dreaming Big, Clinical Informatics

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

stonstad

New Member
10+ Year Member
Joined
May 21, 2012
Messages
7
Reaction score
0
Hi! I am a software developer working at a startup which seeks to merge clinical data from different nomenclatures (ICD9/Medcin/Medispan/SNOMed) and provide a layer of "analytics" on top of this combined data to facillitate discoveries. Think of the software as a "search engine" for questions about clinical data occuring on a large scale.

This software runs "in the cloud" and as a result it can crunch data for millions of deidentified patients and billions of deidentified clinical items, including medications, conditions, immunizations, procedures. (Assume deidentified data and BD agreements in place to facillitate legal exchange of information).

At the moment we are looking to use the software for things like "show me post-op infection rates for a surgeon compared to a national average". While this information is useful and will likely subsidize the continued development of the product, I am working to inspire my colleagues to see a bigger picture of what is possible.

As future MDs, what meaningful correlations might you search for within the data?

Here are some examples I have come up with -- but as a software developer and not an MD, I find my list short and uninspired.

- Track and plot the geographical spread of epidemics, such as influenza.
- Track by geography the diagnosis of specific diseases.
- Demonstrate correlation between Autism and Aspergers specturm and Thimerol-based (Hep-b) vaccinations.

Imagine that this information is available at your fintertips. That is to say, because of cloud computing, computation and storage is not a factor and processing time is zero. And correlation is not causation. What might you look for?

Shaun

Members don't see this ad.
 
Last edited:
any stock options for contributors to this product?
 
Hi! I am a software developer working at a startup which seeks to merge clinical data from different nomenclatures (ICD9/Medcin/Medispan/SNOMed) and provide a layer of "analytics" on top of this combined data to facillitate discoveries. Think of the software as a "search engine" for questions about clinical data occuring on a large scale.

This software runs "in the cloud" and as a result it can crunch data for millions of deidentified patients and billions of deidentified clinical items, including medications, conditions, immunizations, procedures. (Assume deidentified data and BD agreements in place to facillitate legal exchange of information).

At the moment we are looking to use the software for things like "show me post-op infection rates for a surgeon compared to a national average". While this information is useful and will likely subsidize the continued development of the product, I am working to inspire my colleagues to see a bigger picture of what is possible.

As future MDs, what meaningful correlations might you search for within the data?

Here are some examples I have come up with -- but as a software developer and not an MD, I find my list short and uninspired.

- Track and plot the geographical spread of epidemics, such as influenza.
- Track by geography the diagnosis of specific diseases.
- Demonstrate correlation between Autism and Aspergers specturm and Thimerol-based (Hep-b) vaccinations.

Imagine that this information is available at your fintertips. That is to say, because of cloud computing, computation and storage is not a factor and processing time is zero. And correlation is not causation. What might you look for?

Shaun

As a software developer and med student here is my input.

The first two actually were done by google a few years ago. They found that when people got a particular set of symptoms they would search google for those symptoms. Along with the CDC they were able to "real-time" track the spread of the flu across the US. They have this data available for your use in one of their tools...their volume of search data on symptoms/disease is massive. In addition, I believe that at entire department of google is currently working on this to help predict/identify bioterror attacks and disease epidemics.

Also many research projects have compared autism to vaccines, but there is no correlations and certainly no causation relationship.

If you are looking to do research to be published that is a very different avenue than developing a commercial tools to be published. Which one are you going for?

If I were you....I would look in the IBM Watson project. Last thing I read they were aggregating a large volume of cancer journals and patient files to help with diagnosis. This will certainly be a big player in the world of data aggregation that is attempting to helping clinicians of the future solve rare diseases.
 
Members don't see this ad :)
Hi Link2Swim --

Useful suggestions but I feel I may have not explained the nature of my question to you clearly.

Today, there are very few medical record repositories which have more than five million patient records. Ours is one of them, and it is growing at a very fast rate due to the way one of our products integrates with established EHRs. One of the largest obstacles to-date for building meaningful informatics is lack of translation -- a tower of Babel problem casued by different nomenclatures and lack of aggregation caused by different EHR silos. The IBM Watson project is accomplishing useful research but it isn't tryiing to solve these problems and as a result, their research is more targeted that what I have built.

Here is what I am saying -- imaging you have one billion health conditions, five-hundered million medications, one-hundred million procedures, and twenty-five billion results for a deidentified population set. What would you do with it?

Re: commercial versus research applicability -- if the software is generalized for both kinds of use it is essentially a tool and this question isn't something I need to answer. I am genuinely interested in hearing opinions from doctors on what kind of trends or patterns they would look for given a sufficiently large dataset of patient data.
 
Last edited:
Hi Link2Swim --

Useful suggestions but I feel I may have not explained the nature of my question to you clearly.

Today, there are very few medical record repositories which have more than five million patient records. Ours is one of them, and it is growing at a very fast rate due to the way one of our products integrates with established EHRs. One of the largest obstacles to-date for building meaningful informatics is lack of translation -- a tower of Babel problem casued by different nomenclatures and lack of aggregation caused by different EHR silos. The IBM Watson project is accomplishing useful research but it isn't tryiing to solve these problems and as a result, their research is more targeted that what I have built.

Here is what I am saying -- imaging you have one billion health conditions, five-hundered million medications, one-hundred million procedures, and twenty-five billion results for a deidentified population set. What would you do with it?

Re: commercial versus research applicability -- if the software is generalized for both kinds of use it is essentially a tool and this question isn't something I need to answer. I am genuinely interested in hearing opinions from doctors on what kind of trends or patterns they would look for given a sufficiently large dataset of patient data.

I would disagree with the idea there aren't large patient datasets already compiled. Since the 1970's the way health insurance companies figure out what "rate" (aka "risk of getting sick and needing care") was through taking every patient and running creating statistical risk representations with hundreds of millions of medications, diseases, etc.

Insurance companies only use the data for internal purposes. Alot of correlations (think risk factors) were know by insurance companies years before the medical community knew about them. I attended a lecture by the doc who authored most of the hypertension research of the past 20 years. He started by contacting insurance companies and seeing what they had already determined.

We probably agree that there isnt an open source or "public" version of this data.

But I digress... I think your questions is how best would the data be compiled. I think this answer will vary based on whom you ask (i.e. an epidemiologist, or a physician, etc.). In short, people in med school are not going to be able to give you much input on this. I'd ask attending physicians...especially those within research fields. I'd find researchers whom do "meta-analysis" and "clinical research" and see what their needs are... This is likely your target interest. Do you live near a major academic hospital/university?

Meds students are already saturated with information, few try to research beyond the previous published data (that has made it into major textbooks).
 
Your ideas are definitely talked about in many circles. Just today I listened in on a conference where they were talking about patient data from the Medicare database. But these silos are a real thing and to get access to patient data requires managing HIPAA/HITECH regulations. Not only that, this technology, which comes with staffing needs is expensive to launch and run. I suggest you start small and work with one university/health system to develop a concept for a SBIR or STTR grant. Another route to go is to skip patient data and go with high throughput screening. A lot of companies are investing in software developers to speed the discovery of novel therapeutic compounds.
 
Our own electronic record doesn't even play well with itself because of the silo issues. All the services created their own custom templates. This is a huge problem. We can't even reliably populate our anesthesia record with the correct information from our own system. What you're talking about is probably impossible with the current systems.
 
Your ideas are definitely talked about in many circles. Just today I listened in on a conference where they were talking about patient data from the Medicare database. But these silos are a real thing and to get access to patient data requires managing HIPAA/HITECH regulations. Not only that, this technology, which comes with staffing needs is expensive to launch and run. I suggest you start small and work with one university/health system to develop a concept for a SBIR or STTR grant. Another route to go is to skip patient data and go with high throughput screening. A lot of companies are investing in software developers to speed the discovery of novel therapeutic compounds.

These aren't ideas -- I probably didn't explain this well. Through BD agreements we have today a very large and extensive deidentified medical data repository. Our company has translation services working and we completed a unifying "grand' schema which joins ambulatory and in-patient clinical data. The analytics engine is in place and I am now building up meaningful queries -- I am using PQRS and other quality measures for ACO/HiTECH compliance as a starting point. But, as I've hopefully indicated, there are better uses for this data.
 
It definitely does depend on who you ask. As an update, we are bringing in seasoned physicians from different fields tomorrow who can provide additional insight into the different kinds analytics they find useful. Thank you to everyone for their feedback.
 
Our own electronic record doesn't even play well with itself because of the silo issues. All the services created their own custom templates. This is a huge problem. We can't even reliably populate our anesthesia record with the correct information from our own system. What you're talking about is probably impossible with the current systems.

My team has built a system which liberates data from arbitrary silos by exporting and translating the data to a normalized database. The "closed" systems which you and I refer to must still persist data, and it is from this point that you can successfully get at the data -- i.e. hl7 feeds, cda exchange, web services, and straight database access. I guess no one thinks it is possible and it is for this reason we are succeeding where others are not.
 
It sounds great at first, but most of these initiatives in the past have run into the HIPAA buzzsaw. If your data is ever compromised, you guys are toast. That said, what research applications may this have and how can we get involved beyond posting on a webforum?
 
I think the most important thing I'd use something like this for is identifying responder from non responders in regards to medications. It'd be nice to have the genome of every patient but that seems a few decades off. Instead, I bet there are ways to take known patient factors, location, age, ethnicity, PMH, PSH, etc., and figure out which drug they'd likely respond to.
 
Hi! I am a software developer working at a startup which seeks to merge clinical data from different nomenclatures (ICD9/Medcin/Medispan/SNOMed) and provide a layer of "analytics" on top of this combined data to facillitate discoveries. Think of the software as a "search engine" for questions about clinical data occuring on a large scale.

This software runs "in the cloud" and as a result it can crunch data for millions of deidentified patients and billions of deidentified clinical items, including medications, conditions, immunizations, procedures. (Assume deidentified data and BD agreements in place to facillitate legal exchange of information).

At the moment we are looking to use the software for things like "show me post-op infection rates for a surgeon compared to a national average". While this information is useful and will likely subsidize the continued development of the product, I am working to inspire my colleagues to see a bigger picture of what is possible.

As future MDs, what meaningful correlations might you search for within the data?

Here are some examples I have come up with -- but as a software developer and not an MD, I find my list short and uninspired.

- Track and plot the geographical spread of epidemics, such as influenza.
- Track by geography the diagnosis of specific diseases.
- Demonstrate correlation between Autism and Aspergers specturm and Thimerol-based (Hep-b) vaccinations.

Imagine that this information is available at your fintertips. That is to say, because of cloud computing, computation and storage is not a factor and processing time is zero. And correlation is not causation. What might you look for?

Shaun

Let the games begin!
 
Our own electronic record doesn't even play well with itself because of the silo issues. All the services created their own custom templates. This is a huge problem. We can't even reliably populate our anesthesia record with the correct information from our own system. What you're talking about is probably impossible with the current systems.

This is always a huge problem. It's possible within each current major vendor EMR. However, most of the time, all of the various hospital services act very independantly of one another, and IT forgets to integrate all of the structure they put together. Bad build come to bite you on expansion always.
 
It sounds great at first, but most of these initiatives in the past have run into the HIPAA buzzsaw. If your data is ever compromised, you guys are toast. That said, what research applications may this have and how can we get involved beyond posting on a webforum?

We have patient consent and the data is deidentified.
 
I think the most important thing I'd use something like this for is identifying responder from non responders in regards to medications. It'd be nice to have the genome of every patient but that seems a few decades off. Instead, I bet there are ways to take known patient factors, location, age, ethnicity, PMH, PSH, etc., and figure out which drug they'd likely respond to.

This is a good suggestion and I'll make sure the physicians we speak with today give us examples based on this idea. Chakrabs -- I sent you a private message with my email address.
 
Last edited:
Top