News Research Personalized Treatment

Video AI scribes may reduce medication omissions

"Visual information alongside speech can substantially improve the completeness of clinical documentation."

March 23, 2026 By Julie Greenbaum min read

A vision-enabled artificial intelligence (AI) scribe achieved 98% overall accuracy in documenting medication histories and reduced omission errors compared with audio-only input in simulated clinical encounters, according to a study in npj Digital Medicine.

Researchers evaluated a multimodal AI scribe that processes both video and audio input during pharmacist–patient conversations. The system was built on a multimodal generative model and paired with wearable smart glasses to capture visual context during medication history taking.

“Medication history conversations often involve patients showing their medicines or labels rather than describing them verbally,” said lead researcher Bradley D. Menz, PhD, of the College of Medicine and Public Health, Flinders Health and Medical Research Institute at Flinders University in Adelaide, South Australia. “Our findings suggest that enabling AI systems to interpret visual information alongside speech can substantially improve the completeness of clinical documentation.”

Study Design and Methods

For the study, 10 clinical pharmacists recorded 110 simulated medication history interviews using Ray-Ban Meta AI Wayfarer glasses. These recordings were then passed to the AI scribe, which was developed using Google’s Gemini-Pro-2.5 model, where the system documented structured medication histories including patient details and medication information. The scribe was evaluated on 100 test recordings across patient- details and medication-specific fields.

Each encounter included approximately three to five medications and incorporated both spoken information and visible medication packaging.

Across the test set, the AI scribe was evaluated on 2,160 data points, including patient details (name, date of birth, medication allergies) and medication-level details (medication name, strength and form, dosing directions, indication, and clinical notes). Human pharmacist-generated records served as the reference standard and were independently reviewed and adjudicated.

(Left to right) Senior author Ashley M. Hopkins and lead author Bradley D. Menz.

The AI system used structured prompts and a secondary verification step that reprocessed the original video alongside draft outputs to identify and correct errors.

Accuracy and Error Profile

The vision-enabled AI scribe correctly documented 2,114 of 2,160 data points, corresponding to 98% overall accuracy. Field-level accuracy ranged from 96% for patient details to 99% for dosing directions and indication.

Among 46 total errors, 36 were commission errors and 10 were omission errors. Commission errors included incorrect patient identifiers, misidentified medications, and inaccuracies in strength, formulation, or dosing instructions. Omission errors represented data marked as missing despite being present in the encounter.

Accuracy varied by data type and pharmacist for some fields, particularly patient details, medication name, and strength and form; dosing directions and indication showed consistent performance across participants.

Video vs Audio Input

The researchers concluded that integrating visual input provided “substantial gains over audio-only approaches,” with the system providing “significantly improved accuracy over audio-only evaluations. When the same encounters were processed using audio-only input, overall accuracy declined to 81%, compared with 98% using video input.

The difference was primarily driven by omission errors, which increased from 10 with video input to 358 with audio-only processing.

Field-level comparisons showed higher accuracy with video input for:

patient details (96% vs 91%)
medication name: (98% vs 80%)
strength and form (97% vs 28%)
dosing directions (99% vs 91%).

Accuracy for indication (99% vs 99%) and clinical notes (98% vs 99%) did not differ meaningfully between modalities.

Performance gains with video input were consistent across pharmacists, particularly for medication name, strength and form, and dosing directions, with improvements of up to 85% for some fields.

Limitations and Future Applications

Despite the potential benefits of AI scribes, researchers cautioned against overreliance on the technology.

“Although the results are promising, the technology is not error-free. AI-generated documentation can still miss information or introduce inaccuracies, meaning clinicians must remain actively involved in reviewing and verifying AI-produced notes," noted Dr. Menz.

The study also used simulated encounters with pharmacists acting as patients, which may not reflect real-world patient behavior or clinical variability. Recordings were conducted under controlled lighting and noise conditions with clearly labeled medication packaging.

In addition, the dataset emphasized commonly prescribed medications and did not extensively evaluate uncommon or complex formulations. Generalizability may also be limited by the use of Australian settings, English language interactions, and specific recording hardware.

Although two independent evaluators assessed AI outputs, with discrepancies resolved by a third, some clinical subjectivity may remain in the findings.

Despite limitations and the need for further research, the findings offer clinicians a glimpse into the practical, real-world potential of AI tools.

“AI scribes are increasingly being explored as a way to reduce administrative burden in health care,” noted the study's senior author Ashley M. Hopkins, PhD, Associate Professor, College of Medicine and Public Health, Flinders Health and Medical Research Institute, at Flinders University. “This study demonstrates that incorporating visual context may be an important step toward making these systems more accurate and clinically useful.”

Investigator disclosures can be found online in the published article.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (AACE^®). The ideas and opinions expressed in AACE Endocrinology do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Performance of a convolutional neural network in determining differentiation levels of cutaneous squamous cell carcinomas was on par with that of experienced dermatologists, according to the results of a recent study published in JAAD International.

“This type of cancer, which is a result of mutations of the most common cell type in the top layer of the skin, is strongly linked to accumulated [ultraviolet] radiation over time. It develops in sun-exposed areas, often on skin already showing signs of sun damage, with rough scaly patches, uneven pigmentation, and decreased elasticity,” stated lead researcher Sam Polesie, MD, PhD, Associate Professor of Dermatology and Venereology at the University of Gothenburg and Practicing Dermatologist at Sahlgrenska University Hospital, both in Gothenburg, Sweden.

KOL Commentary

Watch