Insights AI Tutorials: A Series Diagnostics & Imaging Glucose Monitoring & Insulin Delivery Predictive Risk Models

From black box to feedback loop

A friendly, jargon-light guide to what AI actually is (and isn’t).

March 23, 2026 By Johnson Thomas, MD, FACE, FEAA min read

Why Should You Care About AI?

If you have ever tried to explain the hypothalamic-pituitary-adrenal axis to a patient in under 60 seconds, congratulations — you already understand the challenge of making complex systems accessible. That is precisely what this article series aims to do, except the complex system in question is artificial intelligence (AI).

As endocrinologists, we are no strangers to feedback loops, pattern recognition, and making sense of noisy data. (If you can interpret glucose tracing from a continuous glucose monitor [CGM] at 2 a.m., you can handle anything AI throws at you.) The truth is, AI is not some futuristic abstraction reserved for Silicon Valley engineers. It is already in your clinic — embedded in CGM algorithms, thyroid ultrasound software, and the predictive models your electronic health record quietly runs in the background.

This series, for AACE Endocrinology AI, will take you from the fundamentals to the frontier. Consider this article one on your orientation day. No prior computer science knowledge required. A sense of humor, however, is strongly recommended.

What Is AI, Really?

At its core, AI is simply software that can perform tasks that normally require human intelligence. That includes recognizing patterns, making predictions, understanding language, and learning from experience.

The endocrinology analogy: Think of AI as a very enthusiastic endocrine fellow. It cannot actually think the way you do, but if you show it 50,000 thyroid ultrasound images labeled "benign" or "suspicious," it will eventually learn to tell the difference — sometimes with startling accuracy. It is pattern recognition on steroids (pun absolutely intended).

The Three Flavors of AI

Narrow AI (what we have): Narrow AI designed to perform one task. Your spam filter? Narrow AI. The algorithm that adjusts your patient's insulin pump? Narrow AI. It is brilliant at its specific job but cannot do anything else. Ask your CGM algorithm to write a poem and it will stare at you blankly. Much like asking a thyroidologist to manage a complicated pregnancy and deliver a baby.
Artificial general intelligence (the dream): This would be a machine that can do anything a human can do intellectually — diagnose a pheochromocytoma, write a grant proposal, and make decent coffee. We are not there yet. Not even close.
Superintelligent AI (science fiction — for now): This is AI that surpasses human intelligence in every domain. This is what keeps Elon Musk up at night. For the rest of us, the insulin dosing algorithm is exciting enough.

The Building Blocks: Key Concepts Explained

Algorithms: The Recipe Behind the Magic

An algorithm is simply a set of step-by-step instructions for solving a problem. You already use algorithms every day. When you evaluate a thyroid nodule, you follow a mental algorithm: check the size, assess the ultrasound features using TI-RADS, review the patient's risk factors, and decide whether to biopsy.

AI algorithms work the same way, except they can process thousands of variables simultaneously and never get tired, hungry, or distracted by their pager going off during a procedure. The downside? They have no clinical intuition. They will never look at a patient and think, "Something just doesn't feel right." That remains your superpower.

Machine Learning: Teaching Computers to Learn from Data

Machine learning (ML) is the most important subfield of AI for clinical medicine. Here is the key insight: instead of programming a computer with explicit rules ("if TSH is above 10, flag as hypothyroid"), you feed it data and let it figure out the rules on its own.

The endocrinology analogy: Remember your first year of fellowship? Your attending did not hand you a 500-page manual covering every possible clinical scenario. Instead, they showed you case after case. Over time, your brain started recognizing patterns — "this presentation looks like Graves' disease" or "this adrenal incidentaloma is probably nothing, but let's check out a dexamethasone suppression test just in case." Machine learning works the same way. The more data (cases) you show it, the better it gets.

There are three main types of machine learning, and each one has a handy analogy:

Supervised Learning: The Attending-Fellow Model

In supervised learning, the algorithm learns labeled data — examples where the correct answer is already known. You feed it thousands of thyroid ultrasound images that have been labeled by expert radiologists as "benign" or "malignant," and the algorithm learns to classify new, unlabeled images.

This is exactly how fellowship training works. Your attending (the label) shows you cases with the correct diagnosis. Eventually, you learn patterns. The attending is the "supervision" in supervised learning.

Clinical example: An AI model is trained on thousands of retinal photographs, labeled by ophthalmologists, to detect diabetic retinopathy. The labels ("disease" or "no disease") are the supervision.

Unsupervised Learning: The Clustering Fellow

In unsupervised learning, the algorithm receives data without any labels and must find structure on its own. Nobody tells it what to look for — it just groups similar things together.

The endocrinology analogy: Imagine you are handed 10,000 patient charts with labs, imaging, and clinical notes, but no diagnoses. You start noticing clusters: this group of patients all has elevated calcium and low phosphorus (hello, hyperparathyroidism), while that group has high cortisol with low ACTH (Cushing's syndrome from an adrenal source). You did not need someone to tell you the diagnoses — you found the patterns yourself. That is unsupervised learning.

Clinical example: Clustering algorithms that identify previously unrecognized subtypes of type 2 diabetes based on combinations of insulin resistance, beta-cell function, and autoimmune markers.

Reinforcement Learning: The Trial-and-Error Resident

Reinforcement learning is through trial and error, with rewards for good decisions and penalties for bad ones. The algorithm takes actions, observes the results, and adjusts its strategy.

The endocrinology analogy: Think of how you learned to titrate insulin. You started a patient on a dose, checked their glucose, adjusted up or down, checked again, and adjusted again. Over time, you developed an intuitive sense of how to dose. Each good outcome (glucose in range) was a "reward," and each hypoglycemic episode was a "penalty." Reinforcement learning algorithms do exactly this — except they can run through millions of scenarios in the time it takes you to check one fasting glucose.

Clinical example: AI systems that can learn to optimize insulin dosing in closed-loop ("artificial pancreas") systems by continuously adjusting delivery based on real-time glucose feedback.

Deep Learning and Neural Networks: The Brain Metaphor

You have probably heard the term "deep learning" thrown around at conferences, usually by someone trying to sound impressive. Let us demystify it.

Deep learning is a subset of machine learning that uses structures called neural networks — computational models loosely (very loosely) inspired by the human brain. A neural network consists of layers of interconnected "nodes" (think of them as artificial neurons) that process information.

The endocrinology analogy: Think of a neural network like the endocrine system itself. The hypothalamus sends signals to the pituitary, which sends signals to the target gland, which produces hormones that feed back to the hypothalamus. Each layer processes and transforms the signal. In a neural network, data enters the first layer (the "hypothalamus"), gets transformed at each subsequent layer, and eventually produces an output (the "hormone response"). The "deep" in deep learning simply means there are many layers — like a cascade with multiple levels of regulation.

Here is why this matters clinically: deep learning is what powers the most impressive AI achievements in medicine. The algorithms that read radiology images, analyze pathology slides, and interpret ECGs all use deep neural networks. They are particularly good at finding patterns in images and complex datasets — which is why medical imaging has been one of the first clinical domains to benefit.

How AI Models Are "Trained": The Data Diet

AI models do not emerge fully formed like Athena from the head of Zeus. They need to be trained, and training requires data — lots of data.

The endocrinology analogy: Training an AI model is like running an endocrine stimulation test. You give the system a stimulus (data), observe the response (prediction), compare it to what should have happened (the correct label), and then adjust. Repeat this process thousands or millions of times, and the model gradually improves — much like adjusting levothyroxine until you finally nail that TSH target.

The training process involves three critical datasets:

Training set (~70% of data): This is your two years of fellowship — the hundreds of patients you see, the cases you present at conferences, the late-night consults that teach you what textbooks cannot. It is where real learning happens.

Validation set (~15% of data): This is your case review with your program director. You are not being formally tested, but your attending is checking whether you are developing the right clinical instincts — and course-correcting before bad habits solidify. The model gets the same kind of mid-training feedback.

Test set (~15% of data): This is the patient who walks into your clinic on your first day as an attending, no safety net, no one to run it by. You have never seen this exact case before, and your performance now reflects whether you truly learned the medicine or just memorized medical facts. This is the real thing.

A model that performs brilliantly on its training data but terribly on new data is said to be "overfitting." It has essentially memorized the answers instead of learning the underlying concepts. We have all met that fellow who can recite every UpToDate article but freezes in front of an actual patient. Same principle.

Natural Language Processing and Large Language Models

You have almost certainly interacted with a large language model (LLM) by now — if not professionally, then at least to settle an argument at dinner. ChatGPT, Claude, Gemini, and their cousins are all LLMs, and they represent a revolution in how computers understand and generate human language.

Natural language processing (NLP) is the branch of AI concerned with enabling computers to understand, interpret, and generate human language. LLMs are the most powerful NLP tools we have today. They are trained on enormous amounts of text data and can generate remarkably coherent, contextually appropriate language.

The endocrinology analogy: Imagine if someone reads every endocrinology textbook, every journal article, every clinical guideline, and every patient note ever written — and could recall any of it instantly. That is roughly what an LLM does, except it does not truly "understand" the content the way you do. It is extraordinarily good at predicting what word should come next in a sentence, which creates the illusion of understanding. It is the difference between a medical student who has memorized Harrison's manuals and an experienced clinician who has internalized the concepts.

LLMs are already being explored for clinical documentation, patient communication, literature synthesis, and clinical decision support. But they can also "hallucinate" — confidently generating information that is completely wrong. If you have ever had a medical student present a case with absolute conviction only for none of the details to be accurate, you understand the concept perfectly.

The Elephant in the Room: Bias, Limitations, and What Can Go Wrong

No article on AI fundamentals would be complete without a frank discussion of what can go wrong.

Bias In, Bias Out

AI models are only as good as the data they are trained on. If the training data reflects existing biases in health care — and it almost always does — the model will perpetuate and sometimes amplify those biases.

Example: If an AI model for predicting diabetic complications is trained predominantly on data from one ethnic group, it may perform poorly for patients from other backgrounds. We already know that eGFR calculations have been revised to address racial bias in traditional formulas. AI models face the same challenges, often on a much larger and less transparent scale.

The Black Box Problem

Many advanced AI models, particularly deep learning systems, are "black boxes." They can give you an answer, but they cannot explain their reasoning. This could be problematic in medicine, where we need to understand why a diagnosis or recommendation was made. There are a lot of debates about whether medical AI should be explainable or not and if it is really helpful.

The endocrinology analogy: Imagine a fellow who always gets the right diagnosis but, when asked to explain their reasoning, just shrugs and says, "I don't know, it just felt right." You would not accept that. We should not accept it from AI either — and the field of "explainable AI" is actively working on this problem. Although, everything may not be explainable yet.

Data Privacy and Regulatory Concerns

Training AI models on patient data raises significant privacy concerns. Models can sometimes inadvertently memorize specific patient information from their training data. Regulatory frameworks like HIPAA in the United States and GDPR in Europe are still catching up to the unique challenges AI poses. As clinicians, we must be vigilant advocates for our patients' data privacy, even when — especially when — the technology is exciting.

AI in Endocrinology: It's Already Here

If you think AI in endocrinology is a future concern, think again. Here are areas where AI is already making an impact:

Diabetes management: Closed-loop insulin delivery systems (the "artificial pancreas") use real-time AI algorithms to adjust insulin dosing based on CGM data. These systems are arguably the most mature clinical application of AI in our specialty.

Thyroid imaging: AI-powered ultrasound analysis tools can classify thyroid nodules and assign TI-RADS scores, reducing interobserver variability and improving triage efficiency.

Diabetic retinopathy screening: FDA-cleared AI systems (like IDx-DR) can autonomously screen for diabetic retinopathy from retinal photographs, enabling screening in primary care settings without an ophthalmologist.

Predictive analytics: AI models are being developed to predict which patients with prediabetes will progress to yype 2 diabetes, which thyroid nodules harbor occult malignancy, and which patients with adrenal incidentalomas need further workup.

Clinical documentation: AI-powered ambient listening tools are being used to generate clinical notes from patient encounters to save time, reducing documentation burden.

Quick Reference Glossary

Term	Definition
Algorithm	A set of step-by-step instructions for solving a problem. Your TI-RADS scoring system is an algorithm.
Artificial Intelligence (AI)	Software that performs tasks normally requiring human intelligence, such as pattern recognition, prediction, and language understanding.
Bias	Systematic errors in AI predictions often reflecting inequities in the training data.
Black Box	An AI model whose internal reasoning is opaque — it gives answers but cannot explain them.
Deep Learning	A type of machine learning using multi-layered neural networks. Powers most medical imaging AI.
Hallucination	When an AI model generates confident but factually incorrect information. The LLM equivalent of a confabulating patient.
Large Language Model (LLM)	AI trained on vast text data to understand and generate human language (e.g., ChatGPT, Claude).
Machine Learning (ML)	AI that learns patterns from data rather than following explicit rules. The workhorse of clinical AI.
Neural Network	A computational model loosely inspired by the brain, with layers of nodes that process information.
Overfitting	When a model memorizes training data instead of learning general patterns. Performs great on old data, poorly on new data.
Reinforcement Learning	Learning through trial and error, optimizing actions based on rewards and penalties.
Supervised Learning	Learning from labeled data where correct answers are provided during training.
Training Data	The dataset used to teach an AI model. Garbage in, garbage out.
Unsupervised Learning	Learning from unlabeled data to discover hidden patterns and groupings.

Looking Ahead: What's Coming in This Series

This article has laid the groundwork. You now understand what AI is, how machine learning works, and why it matters to your practice. But we are just getting started.

In upcoming articles, we will dive into how AI models are evaluated (sensitivity, specificity, and the ROC curves you thought you left behind in biostatistics class), explore the specific AI tools entering endocrine practice, discuss the ethics of algorithmic medicine, and ultimately equip you to critically appraise AI studies the same way you appraise any clinical trial.

The goal is not to turn you into a data scientist — you have enough on your plate between managing insulin pumps, decoding adrenal biochemistry, and convincing your patients that their thyroid is not the reason they are tired all the time. The goal is to make you a literate, critical, and confident consumer of AI in your clinical practice. Because the future of endocrinology is not AI replacing you. It is AI-literate endocrinologists providing better care than either could alone.

About This Series: AI Tutorial Series is a feature of ACCE Endocrine AI. This series is designed to guide practicing endocrinologists from foundational AI concepts through advanced applications in clinical endocrinology. Article 2 will cover evaluating AI model performance: the metrics every clinician should know.

AACE Endocrine AI is published by Conexiant under a license arrangement with the American Society of Clinical Oncology, Inc. (AACE^®). The ideas and opinions expressed in AACE Endocrinology do not necessarily reflect those of Conexiant or AACE. For more information, see Policies.

Performance of a convolutional neural network in determining differentiation levels of cutaneous squamous cell carcinomas was on par with that of experienced dermatologists, according to the results of a recent study published in JAAD International.

“This type of cancer, which is a result of mutations of the most common cell type in the top layer of the skin, is strongly linked to accumulated [ultraviolet] radiation over time. It develops in sun-exposed areas, often on skin already showing signs of sun damage, with rough scaly patches, uneven pigmentation, and decreased elasticity,” stated lead researcher Sam Polesie, MD, PhD, Associate Professor of Dermatology and Venereology at the University of Gothenburg and Practicing Dermatologist at Sahlgrenska University Hospital, both in Gothenburg, Sweden.

KOL Commentary

Watch

From black box to feedback loop

Related Content