Research Diagnostics & Imaging Predictive Risk Models Personalized Treatment

AI in thyroid cancer care: Progress and gaps

Clinicians should interpret a “low-risk” AI result “with caution.”

March 20, 2026 By Julia Cipriano, MS, CMPP min read

Artificial intelligence (AI) shows potential to improve thyroid cancer diagnosis, reduce health care costs, and enable personalized management, but lacks high-quality prospective validation and may underperform in less common subtypes, according to a review published in the Journal of Clinical Endocrinology & Metabolism.

Using a PubMed search of studies published through May 31, 2025, researchers reviewed applications of AI in thyroid cancer diagnosis and management, including ultrasound-based nodule evaluation, lymph node metastasis detection, cytopathology and histopathology, and large language model (LLM)–based analysis.

AI in Ultrasound Analysis

Computer vision applications in thyroidology have evolved from early artificial neural networks trained on human-curated data to convolutional neural networks and transformer network architecture–based models that directly analyze imaging data.

In diagnostic performance, high-volume radiologists using the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) have been found to achieve area under the curve (AUC) values of 0.78 to 0.86, whereas newer AI models trained on large data sets appeared to consistently exceed 0.9, noted Nikita Pozdeyev, MD, of the Department of Biomedical Informatics at the University of Colorado Anschutz, Aurora, and colleagues.

Several studies demonstrated that AI classifiers surpassed both senior and junior radiologists in sensitivity, specificity, and accuracy and improved physician performance when used as a decision support tool, according to the researchers.

In one study, AI assistance increased sensitivity from 84% to 93% and specificity from 73% to 87% among junior radiologists. Improved specificity suggests that AI may reduce fine-needle aspiration (FNA) of benign thyroid nodules by 48% to 60% while maintaining sensitivity comparable to that of ACR TI-RADS.

Generalizability data from a multicenter study including over 10,000 patients across more than 200 hospitals showed an AUC of 0.9, with improved radiologist performance when assisted by AI.

Subtype Limitations

Despite these gains, researchers noted that AI systems are typically trained on binary classification tasks and are not cancer subtype–aware. Models thus learn to recognize papillary thyroid cancer but have been found to underperform in detecting their nonpapillary counterparts—follicular, oncocytic, and some follicular variants of papillary thyroid cancers—which do not manifest classic suspicious ultrasound features.

Given this limitation, clinicians should interpret a “low-risk” AI result “with caution,” said Dr. Pozdeyev in an interview with AACE Endocrine AI. He continued, “Large, solid, isoechoic thyroid nodules warrant biopsy, even if labeled as low risk by the AI. However, AI can reinforce the decision not to biopsy spongiform nodules.”

Beyond Ultrasound: Pathology and Lymph Nodes

AI applications extend to cytopathology and histopathology, where a deep learning model trained on over 11,000 whole-slide images from FNAs of more than 4,000 patients achieved an AUC of 0.98 for distinguishing benign from other cytologic categories and appeared to improve cytopathologist performance; however, limitations characteristic of the field remain, according to the researchers.

For lymph node metastasis detection, the researchers wrote that AI models have demonstrated favorable sensitivity and specificity, with AUC comparable to radiologist performance. They added, however, that the clinical value of these systems is not well-defined and region-specific; high sensitivity may lead to overtreatment in settings without routine prophylactic central neck dissection, whereas in others AI may help reduce unnecessary surgeries.

Learning Language Models in Clinical Tasks

Learning language models (LLMs) have shown the ability to extract data from medical records (with one network achieving 90% accuracy in staging and recurrence prediction from histopathologic reports), generate personalized management recommendations aligned with clinical guidelines, and aid in patient education.

However, researchers cited concerns regarding the use of general-purpose LLMs for clinical tasks, including real-world performance, lack of consistency in generated responses, and hallucinations. "Although ongoing research shows the potential of LLMs in supporting clinical decision-making for thyroid cancer, extensive validation through prospective studies and clinical trials is required," they wrote.

The Path to Adoption

Despite reported improvements in thyroid cancer care with AI applications, adoption of these tools in real-world practice remains slow, according to the researchers. Contributing factors include the complexity and limited availability of AI systems outside research settings, the lack of high-quality independent prospective clinical validation, limited reimbursement by payers, and uncertainty regarding their added value.

The researchers drew a parallel to the widespread adoption of molecular classifier testing for managing thyroid nodules with indeterminate cytology following high-quality prospective multi-center trials. Asked what a comparable trial for AI would need to show to drive similar clinical acceptance, Dr. Pozdeyev said that demonstrating superiority over existing ultrasound-based risk stratification schemas would be key. He added, “Such a system will reduce unnecessary biopsies and diagnostic surgeries, thereby benefiting patients, providers, and payers. I am not aware of any such clinical trial currently underway.”

Given the current pace of validation and implementation efforts, “I doubt that AI will be used for most thyroid ultrasound evaluations in the next year or two," he said in an interview. The researchers nevertheless wrote that “AI has created tremendous opportunities to improve thyroid cancer care.”

For full disclosures of the researchers, visit academic.oup.com.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (AACE^®). The ideas and opinions expressed in AACE Endocrinology do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Performance of a convolutional neural network in determining differentiation levels of cutaneous squamous cell carcinomas was on par with that of experienced dermatologists, according to the results of a recent study published in JAAD International.

“This type of cancer, which is a result of mutations of the most common cell type in the top layer of the skin, is strongly linked to accumulated [ultraviolet] radiation over time. It develops in sun-exposed areas, often on skin already showing signs of sun damage, with rough scaly patches, uneven pigmentation, and decreased elasticity,” stated lead researcher Sam Polesie, MD, PhD, Associate Professor of Dermatology and Venereology at the University of Gothenburg and Practicing Dermatologist at Sahlgrenska University Hospital, both in Gothenburg, Sweden.

KOL Commentary

Watch

AI in thyroid cancer care: Progress and gaps

Related Content