resident's perspective on AI

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.
D

deleted746658

how do radiology resident's feel about AI? excited? fearful? a combination of those emotions?

also, do any of your training institutions offer any lectures on AI/ML?

i try to follow developments in AI, and there are at least a few articles a week that kind of make me quiver as an MS4 entering radiology.

Members don't see this ad.
 
I switched into radiology from a surgical subspecialty. I remember after matching, finishing up my final few months in my old residency being stuck in surgery with the Chair of our department. He spent the whole 4hrs of the case lecturing me about how terrible of a decision he thought it was that I was switching, because we were going to be replaced by machines.

Fast forward to the end of my 1st year of radiology residency, and I don’t think he could be farther from the truth. If you speak to any clinician they think radiology is this cursory look at images which doesn’t require any skill...until they have a very specific question and come down and talk to you in person.

Our PD is pretty proactive about having us have lectures about AI, and how it’s going to fit into the actual work flow. For the most part companies are struggling to get Machine Learning to function outside of the image/data sets that they were used to be trained on. Machine learning also is pretty interesting because the algorithms used to “learn” can pick up on really weird ‘markers’ that they learn from. Amazon’s famous debacle with resume sorting highlights that.

Are there aspects of the day-to-day which are amenable to Machine Learning algorithms of course - Pulmonary nodules are low hanging fruit. The problem is that these algorithms can’t compare to priors yet.

Again, after having seen the clinicians side of radiology where you view it as this blackbox of a department that generates reports, vs actually being in it has made me even more reassured about the longevity of the profession.


TL;DR: it’s a lot of hype by tech companies that don’t really understand what radiologists do, or how CPT coding for reimbursement works.
 
  • Like
Reactions: 6 users
A combination, lots of uncertainties and lots of hype.
Eventually the robotics/algorithms will be able to do every task on this planet but here we are, more than 3 years since Hinton said we should stop training radiologist and all I see is papers of proof of concept.
 
  • Like
Reactions: 1 users
Members don't see this ad :)
Please educate me but I don't know any actual radiologists who think AI is going to take over the field anytime in the foreseeable future.
 
  • Like
Reactions: 1 user
“Radiologists lack exposure to current scientific medical articles on artificial intelligence”

What articles? I was at rsna last year. There was no actual research presented that showed any clinical relevance
 
I switched into radiology from a surgical subspecialty. I remember after matching, finishing up my final few months in my old residency being stuck in surgery with the Chair of our department. He spent the whole 4hrs of the case lecturing me about how terrible of a decision he thought it was that I was switching, because we were going to be replaced by machines.

Fast forward to the end of my 1st year of radiology residency, and I don’t think he could be farther from the truth. If you speak to any clinician they think radiology is this cursory look at images which doesn’t require any skill...until they have a very specific question and come down and talk to you in person.

Our PD is pretty proactive about having us have lectures about AI, and how it’s going to fit into the actual work flow. For the most part companies are struggling to get Machine Learning to function outside of the image/data sets that they were used to be trained on. Machine learning also is pretty interesting because the algorithms used to “learn” can pick up on really weird ‘markers’ that they learn from. Amazon’s famous debacle with resume sorting highlights that.

Are there aspects of the day-to-day which are amenable to Machine Learning algorithms of course - Pulmonary nodules are low hanging fruit. The problem is that these algorithms can’t compare to priors yet.

Again, after having seen the clinicians side of radiology where you view it as this blackbox of a department that generates reports, vs actually being in it has made me even more reassured about the longevity of the profession.


TL;DR: it’s a lot of hype by tech companies that don’t really understand what radiologists do, or how CPT coding for reimbursement works.

The reality that AI will replace Radiologists is next to non-existent. I switched out of Rads for different reasons and never looked back, but even for legalistic reasons, who would one sue if AI did the "read"? Don't think it's going to happen - ever.
 
  • Like
Reactions: 1 user
The reality that AI will replace Radiologists is next to non-existent. I switched out of Rads for different reasons and never looked back, but even for legalistic reasons, who would one sue if AI did the "read"? Don't think it's going to happen - ever.
Could you speak about what the reasons were for you to leave? Or link me to a comment if you have before?
 
Could you speak about what the reasons were for you to leave? Or link me to a comment if you have before?

Just wasn't for me - I left in the era of next to no Radiology jobs, I was living apart from my husband bc he couldn't find a job in the area where I was doing residency so I was miserable, was the only gal in my class, coudln't imagine myself looking at a screen for the next 30 years counting lung nodules, was terrified of the litigation that might come with any mistake, and overall was bored to tears, missed patient contact, etc. just was not the right fit. i still think rads is a great specialty to go into and now the market seems to have gotten much better so it's not that the field is bad it just was not a good fit for me.
 
  • Like
Reactions: 1 users
Just wasn't for me - I left in the era of next to no Radiology jobs, I was living apart from my husband bc he couldn't find a job in the area where I was doing residency so I was miserable, was the only gal in my class, coudln't imagine myself looking at a screen for the next 30 years counting lung nodules, was terrified of the litigation that might come with any mistake, and overall was bored to tears, missed patient contact, etc. just was not the right fit. i still think rads is a great specialty to go into and now the market seems to have gotten much better so it's not that the field is bad it just was not a good fit for me.
Thanks for sharing. I can understand how it didn't fit for you. Hopefully it fits a little better for me :)
 
  • Like
Reactions: 1 user
A lot of articles like that (just like CAD).
Waiting for prospective clinical studies.
For sure and no one will know what this means until it's already happened. For me though, it seems like trying to fight a rising tide with a bucket. 5, 10, 20 years, eventually the tide will win.
 
Members don't see this ad :)
Agree.
Eventually every job can be automated
Its the timeframe that matters
I just hope that it does not happen in the next 35 years
 
  • Like
Reactions: 1 user
Radiology has plenty of procedures to help protect it from image analysis. An EKG can spit out pretty accurate interpretations, but I still feel a lot better having a cardiologist Bless the findings with his or her Holy Water
 
Radiology has plenty of procedures to help protect it from image analysis. An EKG can spit out pretty accurate interpretations, but I still feel a lot better having a cardiologist Bless the findings with his or her Holy Water
I 100% agree. Will the admins who are pushing midlevels feel the same way though? Or would they save a buck?
 
They probably have a little emergency fund for when mid-levels f-up. Rather than paying someone who trained 10,000 hrs for the same “job.”
 

Thoughts?

We will see a lot of this kind of studies, mostly for screening.
Now we wait for the clinical trials to see how they work in a clinical setting with different populations
If they work as well as claimed, detection tasks will be automated..
 
  • Like
Reactions: 1 users
If they work as well as claimed, detection tasks will be automated..

What do you think the impact of that would be?

If it turns out that AI is really good at screening on lose-dose studies, is it likely we would start scanning everybody all the time for everything? And that this would actually increase the need for radiologists, because we'd end up finding a lot more fringe cases that need a human to think through critically?
 
I
What do you think the impact of that would be?

If it turns out that AI is really good at screening on lose-dose studies, is it likely we would start scanning everybody all the time for everything? And that this would actually increase the need for radiologists, because we'd end up finding a lot more fringe cases that need a human to think through critically?

Its hard to predict
But I think that the demand for radiologist will remain stable/increase in the next 10-15 years.
Btw I read somewhere about a screening program (CT) for pancreatic cancer in stanford
 
  • Like
Reactions: 1 users
I


Its hard to predict
But I think that the demand for radiologist will remain stable/increase in the next 10-15 years.
Btw I read somewhere about a screening program (CT) for pancreatic cancer in stanford
With everybody talking about AI and wondering if they're going to replace radiologists anytime soon, I really hope you're right about those 10-15 years with stable/increased demand.

In the beginning of the AI hype I was really afraid of applying for Radiology. However, I really thought about it and realized that I can't stand doing any other thing. If the market for DR gets really worse in the next few years, I'd rather jump into IR or even stay in the field instead of doing IM.
 
  • Like
Reactions: 1 users
With everybody talking about AI and wondering if they're going to replace radiologists anytime soon, I really hope you're right about those 10-15 years with stable/increased demand.

In the beginning of the AI hype I was really afraid of applying for Radiology. However, I really thought about it and realized that I can't stand doing any other thing. If the market for DR gets really worse in the next few years, I'd rather jump into IR or even stay in the field instead of doing IM.

i'm in a similar boat.

it will be an interesting 5-10 years. AI/ML/DL are in the infancy stage, and this must be considered when people critique studies like the one posted above. it will undoubtedly improve - to the point of disrupting the radiology workforce? - that remains to be seen. i wouldn't bet against technology...

i wouldn't be surprised to see current/ incoming crop of rads residents begin to think about swapping to a field more resistant to automation

also wouldn't be surprised to see MD student interest in radiology plummet in the coming 5 years
 
  • Like
Reactions: 1 user
It’s gonna be funny when the fear of AI turns prospective radiologists away and actually creates a radiologist shortage
 
  • Like
Reactions: 2 users
It’s gonna be funny when the fear of AI turns prospective radiologists away and actually creates a radiologist shortage

i truly hope - in fact i pray - that is the scenario

most rads i've talked to think AI will be a helpful tool and don't lose any sleep... all whilst AI experts (eg G Hinton) believe the field will be made obsolete in the imminent future

the reality is, nobody knows... where this technology will be in 10 years is anybody's guess.
 
  • Like
Reactions: 1 user
i truly hope - in fact i pray - that is the scenario

most rads i've talked to think AI will be a helpful tool and don't lose any sleep... all whilst AI experts (eg G Hinton) believe the field will be made obsolete in the imminent future

the reality is, nobody knows... where this technology will be in 10 years is anybody's guess.

Hinton said the field would be obsolete in 5 years, in 2016.
The hype around AI is huge, and what the tech guys want you to believe for them to get funding is nowhere near what is happening

Apps for radiology has been around for a long time now and even this screening low dose CT article had many articles before.

For AI to disrupt the radiology field it should start to read studies from A-Z without any input from radiologist.
Okey it can identify lung nodules, but can it see the pericardial effusion? Or the hypodensity in the liver? Or the rib lesion or or or or
Once an algorithm can detect ALL the possible abnormalities, and no supervision by the radiologist is neccessary, than i say we are doomed
For now, even if the AI can identify the lung nodules, i dont see any radiologist trusting the algorithm without double checking the results.(same amout of time, maybe more).

But yes, you cant predict what will happen in 10 years.
 
  • Like
Reactions: 1 users
The more concerning thing is that the ACR is currently lobbying to create Radiology midlevels. Forget about AI. Imagine all the RadPartners/Envision/RadNets of the world hiring an army of RAs and your need for Radiologists will plummet.
 
  • Like
Reactions: 2 users
The more concerning thing is that the ACR is currently lobbying to create Radiology midlevels. Forget about AI. Imagine all the RadPartners/Envision/RadNets of the world hiring an army of RAs and your need for Radiologists will plummet.
I am a lowly med student on this forum and do not claim to really know the field, but from what I've been reading on all these forums between specialties, this seems like the most efficient way to ruin the field of radiology for physicians.

"Mid-level creep" is what I see over and over again as concerns for entering certain specialties, that and giving over private practices to bigger companies that employ you and don't have to pay you what you're worth anymore
 
The more concerning thing is that the ACR is currently lobbying to create Radiology midlevels. Forget about AI. Imagine all the RadPartners/Envision/RadNets of the world hiring an army of RAs and your need for Radiologists will plummet.


So I get this argument, but at the same time we have a group of providers that can show us that isn’t really the case.

Anesthesia.

In the 80s anesthesiologists were told that they were joining a sinking ship, and their profession was doomed all thanks to CRNAs. Here we are almost 40 years later, and they’re still chugging along without a catastrophic crash in their job market.

As for mid-level creep, the best thing we as physicians regardless of specialty can do is to stop consolidating our practices. Big corporations will always want the cheapest provider, regardless of their quality. If they can make money even while being sued/providing substandard care, that’s just the cost of doing business.
 
The CRNAs have just embarked on a widespread campaign to both change their name to Nurse Anesthesiologist and also directly accused anesthesiologists as fraudulent.

https://www.aana.com/docs/default-s...crnas-we-are-the-answer.pdf?sfvrsn=b310d913_4

Some key excerpts
"By carefully examining overcompensation of physician anesthesiologists for services that can be provided as safely and more cost-effectively by CRNAs, a substantial portion of this percentage can be realized."

"All CRNAs are board certified, while only 75 percent of physician anesthesiologists are board certified"

"CRNAs are the only anesthesia professionals required to attain clinical experience prior to entering an educational program"

"the American Society of Anesthesiologists (ASA) inflates years of schooling to 12-14 by including a four year bachelor's degree attained prior to entering medical school, and a post-residency fellowship in an anesthesiology subspecialty such as chronic pain management, which many physician anesthesiologists do not pursue. The bachelor's degree is typically not healthcare-focused. The ASA also inflates the number of clinical hours attained by residents to approximately 14,000-16,000, which is 2,000-4,000 hours more than the actual number of 12,120. An important difference between clinical education hours attributed to nurse anesthesia students and anesthesiology residents is that the hours claimed by SRNAs are those actually spent providing patient care, while the hours claimed by anesthesiology residents are all hours spent in the facility, including those hours not involved in patient care."
 
  • Like
  • Sad
  • Wow
Reactions: 3 users
Lol wtf... radiology midlevels? As a board certified physician, why the hell would I care what a midlevel reads and interprets? I may as well do it myself. The only person I trust to read something is another board certified physician.
 
  • Like
Reactions: 10 users
Anyone else think the midlevel threat is completely overblown? I've seen midlevels where I work (pre M-1 currently), and they quiver and panic at the site of a dialysis patient. Hell, even when they try to consult some complicated patients, the doctors at the other end will say "have the physician evaluate the patient then call me".
 
All threats are overblown, with the exception of senior radiologists selling our futures my selling their groups to private equity like Radpartners, etc

That is the only threat with potential to destroy our field
 
  • Like
Reactions: 2 users
Im not radiology.... but in my opinion during our careers I think AI will, at MOST, be an adjunctive tool we use in medicine for very specific tasks but it certainly won’t independently replace a physician in any field in the near future.
 
Lol wtf... radiology midlevels? As a board certified physician, why the hell would I care what a midlevel reads and interprets? I may as well do it myself. The only person I trust to read something is another board certified physician.

The only person I trust for general care is a physcian
The only person I trust for anesthesia is a physician
Etc.

Do lobbiest care about who I trust, or do they care about making money? How much influence do you have with an opinion? I’m sad that physicians can’t get together and say that this has already gone too far (not necessarily even in a formal matter).
 
Last edited:
AI is overblown. I even consulted with a big tech company on their AI project. We are decades away if ever from being replaced by AI. Trust me, if radiologists can be successfully replaced by AI, absolutely no field in medicine, business, law, etc is safe. Performing radiology requires knowledge base which can be programmed but machines are very poor at recognizing less obvious image findings and synthesizing new findings and conclusions. It just takes one miss, one untimely death, and $20 million lawsuit to keep AI at bay. AI will supplement not replace radiologists. Much like an improved version of CAD, which isn't all that great to begin with.
 
  • Like
Reactions: 2 users
Regarding the use of midlevels in radiology like Radiology Assistants, I wouldn't lose any sleep on it. I don't support the creation of RA's but the predecessor to the RA's, the RPA's, were created without the approval of any radiology governing body and our leaders had to deal with it. Hence, the RA. As long as no midlevel group can monopolize that niche and as long as radiology does not depend on RA's as anesthesiologists depend on CRNA's, radiology has nothing to worry about.
 
It's very unlikely that radiology will use midlevels as extensively as anesthesiology. Many bread and butter radiology procedures can be done by any radiologist, ie, thora, para, LP, FNA, beast biopsies, fluoro, etc. The more difficult or higher end procedures are for the subspecialist like IR, MSK, body, mammo. Midlevels may be hired to do some scutwork procedures. I work for one of the largest pp groups in the country and we don't employ any midlevels at all. I could see how we could use them but they are more of a luxury than a necessity. Many of my partners would rather do more work themselves rather than decrease their income by hiring unnecessary help.
 
It's very unlikely that radiology will use midlevels as extensively as anesthesiology. Many bread and butter radiology procedures can be done by any radiologist, ie, thora, para, LP, FNA, beast biopsies, fluoro, etc. The more difficult or higher end procedures are for the subspecialist like IR, MSK, body, mammo. Midlevels may be hired to do some scutwork procedures. I work for one of the largest pp groups in the country and we don't employ any midlevels at all. I could see how we could use them but they are more of a luxury than a necessity. Many of my partners would rather do more work themselves rather than decrease their income by hiring unnecessary help.

Don't radiology assistants just do simple cxr and direct patients to the CT scan etc. They don't actually read anything.
 
Don't radiology assistants just do simple cxr and direct patients to the CT scan etc. They don't actually read anything.

You're talking about radiology techs (RT) not radiology assistants (RA). RA’s are midlevels and I believe at the master’s level now like PA’s and NP’s. You can have midlevels do whatever you think is appropriate for their level, ie, procedures as well as reading plain films and cross-sectional studies. I know of places that use them in both capacities, particularly at academic centers. They are supposed to be supervised or have their studies staffed like how residents are trained. Again, I don't support this but that's what some academic centers do. Private practice is a different beast and midlevels aren't used as much. More often than not, it would take me longer to staff a midlevel or resident than to read the study myself, especially a cross-sectional study. Speed, accuracy, and RVU's are the most important aspects of private practice. I can't afford to be slowed down. It only makes sense to hire a midlevel if it makes you more productive in your area. Otherwise, rads will complain about having one foisted on them.
 
I don't worry much about AI. Yes, I can see hints of it being useful and smoothing out workflow. But in my next 20-30 years of practice? Mmmm.... we'll see. If the utility of a well-trained radiologist is nullified by computers in that time frame I'll be very, very surprised.

For example, today we had a high-speed MVC and the trauma surgeon came to the reading room. We walked through the scan and identified the acute issues. It only took a couple of minutes and we worked through the case in a collaborative way including prior history (you think an Ivor Lewis esophagectomy with severe chronic inflammation in the proximity of acute fractures and hematoma may be a little difficult for AI to contemplate when there's no history in the system?). I also don't see how a computer saying "consistent with fracture, consistent with hematoma, consistent with laceration" x 100 in the little old lady missing various body parts from a lifetime of surgeries and chronic issues is going to work. Perhaps I am just ignorant to the possibilities of technology, but the synthesis we are able of doing in the reading room far surpasses AI abilities for many, many years to come.
 
  • Like
Reactions: 3 users
I think the limitation of AI is straightforward. You’re developing algorithms that use a whole bunch of prior cases to learn to identify something on an image. What if you have 1) an atypical presentation of something, 2) a unique combination of conditions or 3) a specific question that needs answering that is unique to the patient (what is the anatomic involvement of this pancreatic adenocarcinoma?)?

Deep learning might be able to touch on (2), but the rest... nah. What’s more, every time AI returns a report on a straightforward image—“RUL lung nodule suspicious for neoplasm,” no way the radiologist doesn’t look at the image. My suspicion is that with all these studies showing “AI is just as good as radiologists with this specific set of images,” are radiologists+AI better than either AI or radiologists alone? If so, all AI really does is improve diagnostic abilities, but is replacing no one. If anything all it does is make the practice spend more money for these algorithms, with the hope that the money lost on this software is made up for by increased workflow productivity. Even if it doesn’t replace anyone, if AI makes an error every now and then (it always will), you always want a radiologist checking every study—even briefly—to verify there were no errors.

I think AI will therefore either 1) refine the radiology practice’s diagnostic capabilities or 2) fractionally increase the reading rate of otherwise completely straightforward studies. If it doesn’t do this to a sufficient extent to make up for the cost to own the software in the firstplace, AI flops and it’ll be business as usual. No need to worry guys, we’ll all still be way overworked in twenty years.
 
  • Like
Reactions: 5 users
The private practice guy that I shadowed was also talking about this and literally begged for an AI to take care of his workload lol. The amount of images you have to see as an attending was crazy level. No wonder why radiologists are excited for AI.
 
Radiologists provide the most value in complex cases AI cannot handle due to the need for general intelligence (i.e. delineating and synthesizing a combination of many findings for multiple diagnoses, comparison to priors in the same and different modalities, and generating a good report that the referring doc can act on +/- suggest additional imaging as needed). Basically anyone with polytrauma, multiple pathologies (e.g. infection+cancer), or multiple prior surgeries fits into this category.

As said above, even with the most basic tasks and biased studies, none have shown that AI alone is better than AI+Radiologist (i.e. AI being a tool). In a real-world setting it must have general intelligence capabilities, something we are far away from.

As for midlevels, no subspecialty surgeon wants a midlevel reading their case. Just like with surgery, most of the bread and butter tasks in radiology involve skills that are contained in 1 person and cannot be delegated. This makes midlevel penetration difficult except in basic radiology procedures.

Given this, radiology is one of the safest field from disruption.

Most medical specialties (e.g. primary care, anesthesia, etc.) are at greater threat of midlevel expansion/intrusion.
 
  • Like
Reactions: 5 users
Wonder about your thoughts on this paper, summed up in the NY Times. My personal take is that while AI is likely good for some things, it creates it's own errors, and AI + radiology eyes is going to emerge as the gold standard.

Also, the level of discussion you guys have about this is so much higher than the article it makes me wonder if the authors ever reach out to you?
 
Wonder about your thoughts on this paper, summed up in the NY Times. My personal take is that while AI is likely good for some things, it creates it's own errors, and AI + radiology eyes is going to emerge as the gold standard.

Also, the level of discussion you guys have about this is so much higher than the article it makes me wonder if the authors ever reach out to you?
Here is the original paper by Google in Nature: International evaluation of an AI system for breast cancer screening

To the NYTimes' credit, they quoted one radiologist, Dr. Lehman (as well as one of the co-authors, Dr. Etemadi, who is an anesthesia resident?). Dr. Lehman has a generally bullish long-term outlook on AI for radiology and gives a measured take on the paper that I think is on point. Several points are worth emphasis.

There can be a disconnect between a laboratory study and clinical utility. In the mammography world, the CAD (computer-aided detection) systems developed in the 1990s was approved based on 'reader studies' with ROC curves showing that CAD 'outperformed' the average radiologist, just like the current paper. Fast forward to present day to see what the reality is. We have new technologies like tomosynthesis that improve human performance but that the approved CAD technology can't utilize. We have filmless mammograms in PACS systems making it easy to compare with tons of prior studies but CAD can't integrate either. Radiologists have to take extra time in each study to review the CAD findings and dismiss them as irrelevant, but at least radiologists get paid more to take this extra time. CAD makers have raked in the dough too. The cost to society in the US is $400 million a year. However, diagnostic performance in the contemporary digital mammography screening setting is not clearly superior with CAD than without CAD. This conclusion is drawn in a study by Dr. Lehman published in JAMA IM: Diagnostic Accuracy of Digital Screening Mammography With and Without Computer-Aided Detection. - PubMed - NCBI. CAD provides a cautionary tale: these lay headline-making studies about AI should be looked at like preclinical animal studies in the drug development world. The clinical utility remains unproven. Moreover, even when they are proven at one point in time, parallel advances in other technology can render it less useful in the near future.

Let's examine the details of the Google study and how they relate to clinical reality.

In the FIRST part of the study that contributes to the lay headline claim that AI outperforms radiologists, the authors looked at the performance of the AI system, trained on mammograms from the UK/NHS, on predicting cancer in mammograms from Northwestern Medicine in the US. For this test set, they defined radiologist performance using the original radiology reports produced in routine clinical practice (a patient that was recalled for additional diagnostic evaluation (BI-RADS 0) is a positive call). There are two tricks to be aware of:

FIRST, they defined ground truth as having a breast cancer diagnosis made within the next 27 months after the mammogram. They justify this because they want to mitigate the gatekeeper bias: biopsies are only triggered based on radiological suspicion. They defined the follow-up interval to include the next screening exam (they say 2 years) plus three months (time to biopsy). The problem is 27 months is a long time. Different organizations disagree as to what is the most cost-effective screening interval (USPSTF recommends biennial), but the American College of Radiology and the American Cancer Society recommend annual screening because cancer can grow quickly and more frequent screening saves slightly more lives. I suspect annual screening is the more common practice in the US dataset, as it is at my institution - look at the Nature paper Extended Data Figure 4: the calculated sensitivity drops much more when the follow-up interval goes past the 1 year threshold compared to how much it drops moving past the 2 year threshold. If someone is diagnosed with breast cancer after a negative mammogram, it's either because the cancer was visible but missed by the radiologist, the cancer is present but not visible by mammogram, or the cancer developed after the mammogram. As the followup interval gets longer, it's increasingly likely to be the latter (that it was not diagnosable at that original time) than the former (that it was missed). In retrospective reviews, the proportion of 'interval cancers' (not detected by screening, diagnosed before the next screen) that are truly missed cases (false-negatives) are estimated to be a minority (20-25%). (see The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening). Thus, with the longer follow-up interval definition, radiologists' sensitivity is lower than conventional benchmarks (87% sensitivity with 1 year follow-up, see benchmarking study by Dr. Lehman: https://www.ncbi.nlm.nih.gov.pubmed/27918707) because more of the cancers that these patients get later are not even present at the time of screening! In the setting of annual screening in the US, I think a 15 month follow-up is a more reasonable definition that still reduces gatekeeper bias.

OK so sensitivity is artificially lower because these cancers found on follow-up don't exist yet - but AI is still out-diagnosing humans on the cancers that do exist at the time, right? Not definitely. AI can outperform in one of a few ways: (1) it can find a cancer that is visible to the radiologist but the radiologist missed, or (2) it can find a cancer that is invisible to the radiologist; these are useful (1 leads to biopsy, 2 leads to a different imaging modality to try to localize a lesion). However, there is another possibility that must be considered: (3) it can predict the future development of cancer even though none exists at this time. Dr. Lehman has shown that deep learning on mammograms predicts 5-year risk of breast cancer with an AUC of 0.68, which is as good or better than the risk scores that breast specialists commonly use based on factors like family history, hormone exposure, age, breast density, etc. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. - PubMed - NCBI Note the AUC of the Google AI on the US test set for 2-year risk of breast cancer was 0.76. Risk prediction is useful in its own way, but it's not the same as making a cancer diagnosis. There's no lesion to directly sample, leaving considerable uncertainty - the AUC is not 1. There's no spot that someone can cut out or irradiate right now. The risk may be multifactorial, with some modifiable and some nonmodifiable risks - maybe the patient is smoking and the AI sees vascular calcifications, or maybe the normal breast tissue shows changes due to a hormonal exposure that also increases cancer risk, or the breast shows changes of prior radiation therapy to the chest. It's hard to conclude what to do other than perhaps screen for cancer more often. I don't think people see the value proposition of mammography plus AI as population-based risk assessment. Screening mammography already has enough haters from the Gil Welch et al. school of public health thought.

SECOND, the US dataset was enriched in a particular way to help statistical power, but that affects performance measures. Whereas the training set from the UK was a random sample of a screening population, the test set from the US was constructed using screening mammograms from all the women who had biopsies at Northwestern in a certain time period, plus a random sample of the unbiopsied screening population. This means the dataset is enriched for people who have biopsies. People who have biopsies are of two sorts: screening detected abnormalities and clinically detected abnormalities. Via the first pathway, the screening mammograms and subsequent diagnostic mammogram and/or ultrasound had an abnormality that looks suspicious for cancer. Most of these turn out not to be cancer - per the benchmarks linked above, the so-called positive predictive value (PPV) 2 is around 25-30%, with the "acceptable range" being 20-40%. Because of this class imbalance, there is preferential enrichment for hard cases that are false positives - things that look like cancer but turned out not to be. There is a lower proportion of easy true negative cases, which were never biopsied. The specificity will be lower. Via the second pathway, clinically detected abnormalities were either missed or developed after the screener and the patient presented for other reasons (patient or doctor felt a lump, or they had focal pain, or they had metastatic disease detected some other way, and this led to a diagnostic cascade culminating in a tissue diagnosis of breast cancer). That means there is enrichment for hard cases that are easy to miss or not possible to diagnose - false negatives. The sensitivity will be lower. These are called spectrum effects, or spectrum bias. The magnitude of these effects is hard to say, and the authors try to de-bias the effects of undersampling normals on the absolute performance metrics using inverse probability weighting. However, the point remains that you're comparing AI against radiologist selectively on the cases that the radiologists found hard -- it's not a fair playing field. If in a parallel universe AI were the one rendering solo interpretation and gatekeeping biopsies, it's conceivable that you could see that human outdiagnoses AI on the cases on which AI had the most trouble.

In the SECOND part of the study that contributes to the lay headline claim that AI outperforms radiologists, Google got an external firm to do a reader study involving six radiologists, four of whom are not breast fellowship-trained but are probably general radiologists who read mammograms some of the time. These radiologists were asked to read 500 mammograms from the US data set, again with enrichment to ensure statistical power: 25% biopsy-proven cancer, 25% biopsy-proven not-cancer, 50% not biopsied. I've already mentioned how enriching for hard-for-human cases (here, the biopsy-proven not-cancer) makes it hard for humans. In this situation, being a prospective study, a different sort of spectrum bias also becomes relevant: changing prevalence changes the behavior of the readers. When prevalence is higher, readers should be more aggressive. When radiologists read screeners in routine practice, in the back of their mind is a target abnormal interpretation rate (recalls from screening) of ~10%. That benchmark is the first step in a series of steps towards getting a reasonable performance measure for cancer detection. That benchmark is dependent on the population incidence of breast cancer, which is 0.6% in a screening population. All that calibration goes out the window in a simulation in which 50% of the cases are abnormal, half of which are cancer and half of which are not. To top it off, the readers were not informed of the enrichment levels! Mathematically, changing prevalence should not perturb the AUC, but it certainly changes the mindset of the reader. My confidence would be shaken if I found myself 10 stacks of 10 screening mammograms deep and I've already called back 50 studies rather than a more typical 10.

You get the feeling something is weird when you see that performance of the six readers averaged when using the traditional 1 year follow up definition is only like sensitivity 60% and specificity 75% (Fig 3c).

Last point:
In the real world, initially, radiologists will use AI as a tool rather than be replaced altogether by AI. No tech company will wholly assume the liability risk of misses in a cancer screening program. Repeating the above:

As said above, even with the most basic tasks and biased studies, none have shown that AI alone is better than AI+Radiologist (i.e. AI being a tool). In a real-world setting it must have general intelligence capabilities, something we are far away from.

Until we have a clinical trial showing that performance is superior for AI alone over AI+Radiologist, I do not think AI will be doing anything alone, at least in a country that is not resource-poor. Dr. Lehman, as paraphrased in the NYT, seems to think eventually computers will render the sole interpretation for some mammograms. The key word is 'eventually.' The timeline is uncertain - nobody can predict the future - but I think progress in the highly regulated and litigious field of medicine tends to be incremental, not disruptive, and the radiology workforce will have time to adapt to practice-changing technology, as it has plenty of times in the past.
 
Last edited:
  • Like
Reactions: 11 users

Wow. I'd go to your TED talk, sadly I think it would be me and only a few others. Splashy headlines of "AI beat radiologists" don't like nuances like sample biases and spectrum effects.

In the FIRST part of the study that contributes to the lay headline claim that AI outperforms radiologists, the authors looked at the performance of the AI system, trained on mammograms from the UK/NHS, on predicting cancer in mammograms from Northwestern Medicine in the US. For this test set, they defined radiologist performance using the original radiology reports produced in routine clinical practice (a patient that was recalled for additional diagnostic evaluation (BI-RADS 0) is a positive call). There are two tricks to be aware of:

FIRST, they defined ground truth as having a breast cancer diagnosis made within the next 27 months after the mammogram. They justify this because they want to mitigate the gatekeeper bias: biopsies are only triggered based on radiological suspicion. They defined the follow-up interval to include the next screening exam (they say 2 years) plus three months (time to biopsy). The problem is 27 months is a long time. Different organizations disagree as to what is the most cost-effective screening interval (USPSTF recommends biennial), but the American College of Radiology and the American Cancer Society recommend annual screening because cancer can grow quickly and more frequent screening saves slightly more lives. I suspect annual screening is the more common practice in the US dataset, as it is at my institution - look at the Nature paper Extended Data Figure 4: the calculated sensitivity drops much more when the follow-up interval goes past the 1 year threshold compared to how much it drops moving past the 2 year threshold. If someone is diagnosed with breast cancer after a negative mammogram, it's either because the cancer was visible but missed by the radiologist, the cancer is present but not visible by mammogram, or the cancer developed after the mammogram. As the followup interval gets longer, it's increasingly likely to be the latter (that it was not diagnosable at that original time) than the former (that it was missed). In retrospective reviews, the proportion of 'interval cancers' (not detected by screening, diagnosed before the next screen) that are truly missed cases (false-negatives) are estimated to be a minority (20-25%). (see The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening). Thus, with the longer follow-up interval definition, radiologists' sensitivity is lower than conventional benchmarks (87% sensitivity with 1 year follow-up, see benchmarking study by Dr. Lehman: https://www.ncbi.nlm.nih.gov.pubmed/27918707) because more of the cancers that these patients get later are not even present at the time of screening! In the setting of annual screening in the US, I think a 15 month follow-up is a more reasonable definition that still reduces gatekeeper bias.

This is being overly generous. The paper does NOT say that 27 months was pre-specified. Did they move the target/goalposts to ensure maximum difference? They only say "similar results were observed" when they used one year. Small point and things didn't change much with a 1 year cut point, but have to remember that the authors are Google employees. They have a project to sell, to develop into the next phase. You see this all the time with drug development.

You can see a much more willful bias when selecting 2/3 non-breast docs to read from highly atypical mammogram populations in terms of percent +CA and in terms of difficult calls. All of this was designed to increase the difference and give AI an edge.

But you're right about development. If this was a phase 1 human study, then it is a "go." A phase 2 or 3 study comes next. Since Google apparently has conviction (and does not lack for funding), they'd be right to do a phase 3 trial. If I were designing the next study, it would be prospective, blinded RCT in general rad centers. AI + human (intervention) vs. human alone or with previous iterations of CAD (control). This would be ethical because you would not be putting people into harm's way given human oversight. Would need pre-specified outcomes of 15 months CA Dx. The NEXT trial would be AI alone, but would have to have a very robust DSMC with unblinded and distant breast radiologists providing second reads for individualized safety oversight.
 
  • Like
Reactions: 1 user
This is being overly generous. The paper does NOT say that 27 months was pre-specified. Did they move the target/goalposts to ensure maximum difference? They only say "similar results were observed" when they used one year. Small point and things didn't change much with a 1 year cut point, but have to remember that the authors are Google employees. They have a project to sell, to develop into the next phase. You see this all the time with drug development.

You can see a much more willful bias when selecting 2/3 non-breast docs to read from highly atypical mammogram populations in terms of percent +CA and in terms of difficult calls. All of this was designed to increase the difference and give AI an edge.

But you're right about development. If this was a phase 1 human study, then it is a "go." A phase 2 or 3 study comes next. Since Google apparently has conviction (and does not lack for funding), they'd be right to do a phase 3 trial. If I were designing the next study, it would be prospective, blinded RCT in general rad centers. AI + human (intervention) vs. human alone or with previous iterations of CAD (control). This would be ethical because you would not be putting people into harm's way given human oversight. Would need pre-specified outcomes of 15 months CA Dx. The NEXT trial would be AI alone, but would have to have a very robust DSMC with unblinded and distant breast radiologists providing second reads for individualized safety oversight.

Agreed!

For the retrospective clinical review part of the paper, Extended Data Figure 5 (attached) shows how if you set the follow-up at 12 months, the mean human reader actually comes out above the AI system ROC curve, for both UK and US data sets. Things didn't change much with the 1 year cut point for the laboratory reader study part of the paper (Figure 3c). Why is there a difference between these two? In part because of a laboratory effect: radiologists tend to perform worse in the simulated setting than in real clinical practice, probably for a number of reasons. This has been shown in a paper in Radiology: The “Laboratory” Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations

The next step is a so-called prospective clinical-use study. You use the AI system in the clinical setting how you hope it will be used when given regulatory and payer approval. Essentially you replace existing human-crafted CAD systems with a deep learning CAD system - the radiologist retains oversight and liability. The difficulty is you need a massive sample size, because you need a representative (non-enriched) sample with adequate power and breast cancer screening is a low-prevalence situation. Studies of this magnitude have been done before, comparing digital vs. film mammography: https://www.nejm.org/doi/full/10.1056/NEJMoa052911. This time, the comparison of relevance is radiologist + deep learning system versus radiologist + traditional CAD (standard of care). Don't hold your breath.

The Google AI for mammography is a neat incremental advance insofar as it is a promising replacement for existing CAD products. You can see how this model will not suddenly disrupt the field, put radiologists out of a job, reduce the time it takes to read a mammogram, or even save society money.
 

Attachments

  • Capture.PNG
    Capture.PNG
    160.1 KB · Views: 99
Last edited:
  • Like
Reactions: 1 user
Top