AI Scribes and ADHD evals

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Atreides

Full Member
2+ Year Member
Joined
Aug 25, 2021
Messages
112
Reaction score
273
Anyone using AI scribes noticed anything with ADHD evals? I really like using an AI scribe, it's cut my documentation time down significantly, especially for new intakes. However, I've noticed that when the chief concern is ADHD, it will always make the note supportive of an ADHD diagnosis. Even in cases where it's crystal clear the patient doesn't have ADHD, it will emphasize symptoms of inattention and de-emphasize everything else. I use Nabla mostly but tried it out on Heidi as well and noticed something similar. I have to manually type out that I don't think it's ADHD and why, then have it regenerate the intake with that instruction added. Even then, it makes clear-cut not-ADHD seem like "maybe not ADHD".

Members don't see this ad.
 
Members don't see this ad :)
Are these AI programs basing notes only off of your data and information or are they taking from a larger database? If the latter, they're probably taking information from all the crappy NPs and pill mills out there using it that justify every case of inattention as ADHD.
The information in the notes is only going to be generated based off the current clinical interview (at least in theory, hallucinations definitely happen which is why you need to read the note afterwards to make sure it didn't make anything up). The model itself, I'm guessing is a fine-tuned version of GPT that they've fed some notes to. If I understand your comment, I'm guessing you're saying they've fed them notes from these pill mills and that's why it's popping everything as ADHD. Definitely possible.
 
Remember that these LLMs are also trained on huge corpuses of data from the Internet among other sources. Thinks about the amount of text on the Internet explaining how every symptom under the sun is actually indicative of ADHD. My guess is something similar happens if the clinical question is autism.

I am somewhat leery of Nabla and other AI scribes that don't let you examine the system prompt. They are generally not running bespoke models but are wrappers on GPT/Claude/some flavor of LLama/Mistral etc. So they're probably not actually trained on notes as such. The devil is in the details of precisely what is in that prompt that they run through every time before generating anything.

I've turned away from using AI scribes for this sort of purpose because I've realized that when I'm writing notes a lot of the time I write to find out what I think is going on. If something else is doing the writing for me I don't put in that focused cognition in the same way, and that's the part that takes significant time for me typically.
 
Anyone using AI scribes noticed anything with ADHD evals? I really like using an AI scribe, it's cut my documentation time down significantly, especially for new intakes. However, I've noticed that when the chief concern is ADHD, it will always make the note supportive of an ADHD diagnosis. Even in cases where it's crystal clear the patient doesn't have ADHD, it will emphasize symptoms of inattention and de-emphasize everything else. I use Nabla mostly but tried it out on Heidi as well and noticed something similar. I have to manually type out that I don't think it's ADHD and why, then have it regenerate the intake with that instruction added. Even then, it makes clear-cut not-ADHD seem like "maybe not ADHD".
I don't care how much it's hyped or how it is labeled...a computer algorithm performing complex regression / linear algebra computations will never be able to do the 'intelligent' thinking/reasoning for us. Corporate medicine has been trying to effectively eliminate clinical/professional judgment (and the reasoning process upon which it depends) via attempts to 'policy/procedure' and 'metric' it away for decades. In my opinion, this has resulted in a net decrease in quality of MH treatment over the years, all under the guise of 'quality improvement.' Mental health administrators, corporations, and regulatory bodies have done a pi**-poor job, in my opinion, utilizing arithmetic and basic stats to 'improve' the quality of healthcare. It will be an interesting debacle to witness what harm they will manage to accomplish with the new AI 'tools' that they have no clue about.

In terms of competent differential diagnosis, the most sophsticated/valid process involves systematic attempts to critically examine (falsify or rule out) particular diagnostic possibilities. You are TRYING to refute (or at least vigorously challenge) your own diagnoses. Theories (and psychiatric diagnoses, which are basically 'working hypotheses') are never definitively 'proven' to be true. Rather, they may survive multiple rigorous attempts at falsification, leading them to achieve the status of being more and more 'corroborated' but never 'proven.'

Of course, at the present level of development/sophistication, a machine learning algorithm would set out to detect, gather, and assemble 'proof/evidence' to 'confirm' the initial diagnostic possibility because it, fundamentally, is engaging in a convergent mathematical process to derive a mathematical 'solution.' They play the game of 'find/prove the diagnosis' and engage in confirmation bias that every poorly trained run of the mill incompetent MH diagnostician does...just with mathematical precision and certainty. Divergent thinking and the application of the philosophy of science (hypothetico-deductive reasoning, logical empiricism) to the differential diagnostic process is trickier. It may get there, but may need to be programmed with input from those who deeply and accurately understands the nature of the task.

And, of course GIGO ('garbage in, garbage out') applies. If the 'labeled' cases of 'ADHD' came to be 'labeled' that way by crappy clinicians (which, arguably, is the modal case in mental health) then the use of those data would be to train the algorithm how to be the most crappy clinician it can be. Even Paul Meehl who was a big proponent of the actuarial approach in psychology openly acknowledged that there are some tasks that an expert clinician (not an algorithm) needs to accomplish and then 'feed' into the actuarial prediction mechanism.
 
Last edited:
I don't care how much it's hyped or how it is labeled...a computer algorithm performing complex regression / linear algebra computations will never be able to do the 'intelligent' thinking/reasoning for us. Corporate medicine has been trying to effectively eliminate clinical/professional judgement (and the reasoning process upon which it depends) via attempts to 'policy/procedure' and 'metric' it away for decades. In my opinion, this has resulted in a net decrease in quality of MH treatment over the years, all under the guise of 'quality improvement.' Mental health administrators, corporations, and regulatory bodies have done a pi**-poor job, in my opinion, utilizing arithmetic and basic stats to 'improve' the quality of healthcare. It will be an interesting debacle to witness what harm they will manage to accomplish with the new AI 'tools' that they have no clue about.

In terms of competent differential diagnosis, the most sophsticated/valid process involves systematic attempts to critically examine (falsify or rule out) particular diagnostic possibilities. You are TRYING to refute (or at least vigorously challenge) your own diagnoses. Theories (and psychiatric diagnoses, which are basically 'working hypotheses') are never definitively 'proven' to be true. Rather, they may survive multiple rigorous attempts at falsification, leading them to achieve the status of being more and more 'corroborated' but never 'proven.'

Of course, at the present level of development/sophistication, a machine learning algorithm would set out to detect, gather, and assemble 'proof/evidence' to 'confirm' the initial diagnostic possibility because it, fundamentally, is engaging in a convergent mathematical process to derive a mathematical 'solution.' They play the game of 'find/prove the diagnosis' and engage in confirmation bias that every poorly trained run of the mill incompetent MH diagnostician does...just with mathematical precision and certainty. Divergent thinking and the application of the philosophy of science (hypothetico-deductive reasoning, logical empiricism) to the differential diagnostic process is trickier. It may get there, but may need to be programmed by so.eone who deeply and accurately understands the nature of the task.
Lol, I'm not using the scribe to do intelligent thinking or come up with a differential diagnosis. Not anymore than an ER doc is using their 22 year old to do their thinking or come up with a differential diagnosis.
 
Lol, I'm not using the scribe to do intelligent thinking or come up with a differential diagnosis. Not anymore than an ER doc is using their 22 year old to do their thinking or come up with a differential diagnosis.
That is reassuring to hear.

So the scribe program just appears to be 'make[ing] the note supportive of the ADHD diagnosis'...which you have to then go back and refute.

So, in this case, the output of the Scribe is some sort of Straw (silicon?) Man argument FOR an ADHD diagnosis that you have to, then, systematically examine and argue against in order to ultimately make a correct diagnosis?

In that case, I withdraw my argument. Maybe this Scribe can replace and serve the function of the average crappy diagnostician, then. Save the MH organizations a crapload of money, lol.
 
Last edited:
That is reassuring to hear.

So the scribe program just appears to be 'make[ing] the note supportive of the ADHD diagnosis'...which you have to then go back and refute.

So, in this case, the output of the Scribe is some sort of Straw (silicon?) Man argument FOR an ADHD diagnosis that you have to, then, systematically examine and argue against in order to ultimately make a correct diagnosis?

In that case, I withdraw my argument. Maybe this Scribe can replace and serve the function of the average crappy diagnostician, then. Save the MH organizations a crapload of money, lol.
My man, I'm not here to argue about the use of AI scribes. I'm asking if others have noticed this in their use of similar software.
 
The information in the notes is only going to be generated based off the current clinical interview (at least in theory, hallucinations definitely happen which is why you need to read the note afterwards to make sure it didn't make anything up). The model itself, I'm guessing is a fine-tuned version of GPT that they've fed some notes to. If I understand your comment, I'm guessing you're saying they've fed them notes from these pill mills and that's why it's popping everything as ADHD. Definitely possible.

I don't use an AI scribe right now but is this consistent across other complaints as well? Like if someone's CC is "I'm depressed" how likely is the note generated supportive of that CC when you think it’s something else?

Also to add this makes sense when you think about how LLM work. There is no “thinking” going on like with a human scribe. A human scribe can recognize quickly in the course of a ER encounter that even though someone might come in with a chief complaint of “I have a migraine” based on the docs questions and responses that they don’t actually think it’s a migraine but rather headache from a hypertensive crisis. A LLM is basically just going to go “what’s the most probable next group of words that tends to occur after this sequence of words”. So if it gets trained on way more encounters for migraines, the most likely outcome to it is that…it ends up being a migraine unless you’re explicitly telling it it’s not.
 
Last edited:
My man, I'm not here to argue about the use of AI scribes. I'm asking if others have noticed this in their use of similar software.

I mean yes, this is definitely a thing that happens. Unfortunately the way these architectures work telling them explicitly not to do something is less effective than telling them explicitly to do something exclusive with the thing you want them to do.

Not so unlike most learners in this regard.
 
I don't use an AI scribe right now but is this consistent across other complaints as well? Like if someone's CC is "I'm depressed" how likely is the note generated supportive of that CC when you think it’s something else?
Nope, just ADHD. I'm wondering if it may have to do with how I do ADHD evals. I typically use a structured interview, DIVA-5 or ACE Plus. I don't really use a structured interview for other chief complaints so it's more of a free flowing eval. But n=1, that's why I'm wondering if others have noticed this or if it's something peculiar to how I do my interviews.
 
Nope, just ADHD. I'm wondering if it may have to do with how I do ADHD evals. I typically use a structured interview, DIVA-5 or ACE Plus. I don't really use a structured interview for other chief complaints so it's more of a free flowing eval. But n=1, that's why I'm wondering if others have noticed this or if it's something peculiar to how I do my interviews.

That’s interesting wonder if it does have to do with the data it’s pulling from them or like you said the structure of the interview itself causing the model to go “this is the most likely sequence of words to occur after all these words talking about attention/concentration/etc”. Because I’m sure the patients themselves might endorse a lot of these symptoms even if you at the end might say it’s not ADHD.
 
The AI will just go based on the transcript and the information being discussed by the patient and clinician. You'll have to end the visit with the patient and then dictate to AI that you don't think it's ADHD but rather another diagnosis instead and give it reasons why for it to know that's what you're thinking. It doesn't know diagnostic criteria or anything. It merely transcribes and organizes the conversation. It's not supposed to come up with an assessment, differential diagnosis, or suggest a plan because then it would be a clinical decision aid (needing FDA clearance) rather than merely a scribe.

I just did an ADHD evaluation using the DIVA last week using Freed. The patient ended up not having ADHD but rather generalized anxiety disorder and it wrote the note. I did feed it information at the end though about how I thought it wasn't ADHD. Here is a snippet of the assessment section that it came out with.

"X presents with symptoms suggestive of anxiety rather than ADHD. She does not report symptoms of ADHD from a young age prior to age of 12 causing impairment at home, school, or any other setting and therefore doesn't meet criteria. Her symptoms are better captured by a diagnosis of generalized anxiety disorder, with feelings of being overwhelmed, trouble concentrating due to anxiety, stress, fear, and avoidance which have emerged and continued over the past few years."
 
The information in the notes is only going to be generated based off the current clinical interview (at least in theory, hallucinations definitely happen which is why you need to read the note afterwards to make sure it didn't make anything up). The model itself, I'm guessing is a fine-tuned version of GPT that they've fed some notes to. If I understand your comment, I'm guessing you're saying they've fed them notes from these pill mills and that's why it's popping everything as ADHD. Definitely possible.
That's exactly what I was suggesting. ADHD diagnoses from minimal symptoms has unfortunately become the norm. So text generated may be noting these symptoms, recognizing that you've said ADHD or noted it, and connecting them. I'm no expert in AI (quite the opposite, lol) but this does not seem to breach the standard of interpreting information to make diagnoses that would require FDA approval like Cloz noted. Agree that you probably need to be more explicit with the software for the assessment and plan portion of your notes.

Imo AI seems like it could be useful for documenting the HPI and pertinent history, especially for new evals or assessments like this. However, I do think that simply using Dragon or another dictation form is more appropriate for the assessment and plan and requires minimal extra effort with far more accurate documentation.
 
OP to the extent you are able to you might want to edit the prompt you use for the scribe when you are doing a planned ADHD eval with the actual precise format of the instrument you are going to use and give it an indication of what sorts of observations you do want extracted from what parts of the interview. Probably also worth just telling it to provide a summary of reported symptoms instead of a diagnosis. Like most useful things a computer can do for someone, it requires a bit of up front work to save large swathes of time on the backend.

As relevant as ever:

 
Practically speaking, "ADHD" evals are better characterized as "stimulant" evals. Not many get offended if we tell them they aren't depressed/psychotic/manic and don't need an antidepressant/SGA/FGA/mood stabilizer.

Anyway, AI just incorporates the patient's words (i.e., biased reporting). If the medication is a controlled sub, most will endorse all symptoms. If the med is an antipsychotic, most will deny all symptoms.

And do we need AI to write our notes and assessments? As specialists, we see the same 5-10 diseases everyday, and it's not too hard to come up with dot phrases/templates/checkboxes that reflect clear, concise thought process that require no or minimal editing. Heck, we can skip documenting our thought process and pretend we're highly paid non-psychiatric specialists: "negative ADHD". Or pretend we're NPs and write: "ADHD".
 
Anyway, AI just incorporates the patient's words (i.e., biased reporting). If the medication is a controlled sub, most will endorse all symptoms. If the med is an antipsychotic, most will deny all symptoms.

Yes, but again, what it does with those words depends crucially on the instructions you give it.


Claude Sonnet, when asked to pretend it is a psychiatrist with experience with LLMs who is a talented prompt engineer, has this suggestion:

"Thanks for bringing this up - I've run into similar issues and found some effective workarounds through prompt engineering. The key is to provide strong "guard rails" before the AI processes the clinical data. I've had success by starting my sessions with a clear statement like: "Evaluate all symptoms objectively and maintain clinical neutrality. Consider and document evidence both supporting and contradicting ADHD diagnosis. Emphasize differential diagnoses and alternative explanations for attention/focus symptoms." When discussing symptoms, I explicitly instruct the AI to "document exact symptom presentations without inferring diagnostic patterns." For the assessment section, I add "Generate a balanced analysis that gives equal weight to evidence for and against ADHD." The trick is treating the AI like a highly literal medical student who needs explicit instruction to avoid diagnostic momentum. Another helpful approach is to have the AI first list out everything that specifically doesn't fit ADHD criteria before letting it analyze what does fit. I've found these guard rails make the notes much more balanced and clinically accurate, though they do require some upfront effort to set up. The good news is that once you develop a good prompt template, you can reuse it consistently. If you'd like, I can share some specific prompt examples that have worked well in my practice. The underlying issue seems to be that these models were likely trained on a dataset skewed toward positive ADHD diagnoses, so we need to actively counterbalance that bias with our prompting."

why yes, Dr. Claude, I would like some examples.

"Here's how I'd structure prompts for ADHD evaluations to ensure balanced documentation:


Opening Directive: "This is an initial psychiatric evaluation. Document symptoms exactly as presented, without diagnostic interpretation. Include ALL presenting symptoms, stressors, and psychosocial factors, whether or not they align with ADHD. In the diagnostic formulation, consider both ADHD and non-ADHD explanations with equal weight."


For the History section: "Document each attention/focus complaint with specific examples from the patient, including: exact timing of onset, situational variation, context of difficulties, and impact across different settings. Note any discrepancies or areas where typical ADHD patterns are NOT seen."


For Review of Symptoms: "List all symptoms methodically, including those that could suggest alternative diagnoses. Specifically note domains where functioning is preserved or inconsistent with ADHD. Document exact sleep patterns, substance use, mood symptoms, and anxiety levels as these may provide alternative explanations."


For Assessment section: "First list factors inconsistent with ADHD, then factors potentially supporting ADHD. Consider timing of onset, pervasiveness across settings, and alternative explanations including:


  • Sleep issues
  • Substance use
  • Mood disorders
  • Anxiety
  • Psychosocial stressors
  • Medical conditions
  • Explicitly state if any DSM criteria for ADHD are NOT met."

For Plan section: "If ADHD is not the primary diagnosis, begin with clear statement of alternative diagnosis/explanations and supporting evidence before addressing ADHD symptoms. Include specific reasons why ADHD may be ruled out if applicable."


These prompts help prevent the AI from defaulting to ADHD-supportive documentation while maintaining a thorough clinical evaluation."
 
Yes, but again, what it does with those words depends crucially on the instructions you give it.


Claude Sonnet, when asked to pretend it is a psychiatrist with experience with LLMs who is a talented prompt engineer, has this suggestion:

"Thanks for bringing this up - I've run into similar issues and found some effective workarounds through prompt engineering. The key is to provide strong "guard rails" before the AI processes the clinical data. I've had success by starting my sessions with a clear statement like: "Evaluate all symptoms objectively and maintain clinical neutrality. Consider and document evidence both supporting and contradicting ADHD diagnosis. Emphasize differential diagnoses and alternative explanations for attention/focus symptoms." When discussing symptoms, I explicitly instruct the AI to "document exact symptom presentations without inferring diagnostic patterns." For the assessment section, I add "Generate a balanced analysis that gives equal weight to evidence for and against ADHD." The trick is treating the AI like a highly literal medical student who needs explicit instruction to avoid diagnostic momentum. Another helpful approach is to have the AI first list out everything that specifically doesn't fit ADHD criteria before letting it analyze what does fit. I've found these guard rails make the notes much more balanced and clinically accurate, though they do require some upfront effort to set up. The good news is that once you develop a good prompt template, you can reuse it consistently. If you'd like, I can share some specific prompt examples that have worked well in my practice. The underlying issue seems to be that these models were likely trained on a dataset skewed toward positive ADHD diagnoses, so we need to actively counterbalance that bias with our prompting."

why yes, Dr. Claude, I would like some examples.

"Here's how I'd structure prompts for ADHD evaluations to ensure balanced documentation:


Opening Directive: "This is an initial psychiatric evaluation. Document symptoms exactly as presented, without diagnostic interpretation. Include ALL presenting symptoms, stressors, and psychosocial factors, whether or not they align with ADHD. In the diagnostic formulation, consider both ADHD and non-ADHD explanations with equal weight."


For the History section: "Document each attention/focus complaint with specific examples from the patient, including: exact timing of onset, situational variation, context of difficulties, and impact across different settings. Note any discrepancies or areas where typical ADHD patterns are NOT seen."


For Review of Symptoms: "List all symptoms methodically, including those that could suggest alternative diagnoses. Specifically note domains where functioning is preserved or inconsistent with ADHD. Document exact sleep patterns, substance use, mood symptoms, and anxiety levels as these may provide alternative explanations."


For Assessment section: "First list factors inconsistent with ADHD, then factors potentially supporting ADHD. Consider timing of onset, pervasiveness across settings, and alternative explanations including:


  • Sleep issues
  • Substance use
  • Mood disorders
  • Anxiety
  • Psychosocial stressors
  • Medical conditions
  • Explicitly state if any DSM criteria for ADHD are NOT met."

For Plan section: "If ADHD is not the primary diagnosis, begin with clear statement of alternative diagnosis/explanations and supporting evidence before addressing ADHD symptoms. Include specific reasons why ADHD may be ruled out if applicable."


These prompts help prevent the AI from defaulting to ADHD-supportive documentation while maintaining a thorough clinical evaluation."
Nice lol, this is perfect. I should tweak my default prompts for Nabla and see what happens with these instructions.
 
Top