1. Mechanism & Architecture
To master Generative AI, one must abandon the mental model of a "Database" or "Search Engine." An LLM (Large Language Model) is a massive statistical function that predicts the next token in a sequence.
Tokenization & The "Cost" of Words
LLMs do not read words; they read Tokens. A token is roughly 0.75 of a word (e.g., "ing", "the", "45").
Clinical Implication: Calculation Errors
Because "1024" might be split into tokens like `10` and `24`, LLMs struggle with arithmetic. They do not "calculate" a Creatinine Clearance; they predict which numbers usually follow the text "CrCl = ". This leads to confident but mathematically incorrect outputs.
The Latent Space
Imagine a 3D map where every medical concept has a coordinate. "Myocardial Infarction" is close to "Troponin," but far from "Appendicitis."
When you query an LLM, it traverses this map. Hallucinations occur when the model takes a "shortcut" through this space—connecting two concepts that are statistically related in language (e.g., a specific drug and a specific side effect) but factually unrelated in reality.
2. RLHF & Model Behavior
Reinforcement Learning from Human Feedback (RLHF)
Raw LLMs are chaotic. To make them usable, companies use RLHF. Humans review AI outputs and rank them: "Answer A is better than Answer B." The model learns to optimize for human preference.
The Sycophancy Trap
Because the model is optimized to "please" the user, it suffers from Sycophancy. If a senior clinician asks, "This looks like Wegener's, right?", the model is statistically biased to agree with the user's premise to maximize the "helpfulness" score, even if the clinical evidence is weak.
Temperature & Determinism
Temperature (0.0 - 1.0) controls randomness.
- High Temp (0.8+): Creative, diverse, high hallucination risk.
- Low Temp (0.0 - 0.2): Deterministic, repetitive, safer for extraction tasks.
Pro Tip: You often cannot control temperature in web interfaces (ChatGPT/Gemini), which means the output is non-deterministic. You cannot rely on it to be identical twice.
3. Privacy & Enterprise Security
The "Training Data" Economy
If the product is free, you are the training data. Standard terms of service usually allow the vendor to use your inputs to retrain future models.
The BAA Requirement
In the US, using a non-enterprise AI tool for PHI is a HIPAA violation. You must have a Business Associate Agreement (BAA). Enterprise environments (e.g., Azure OpenAI, Gemini Enterprise) usually offer "Zero Data Retention," meaning they process the data and immediately discard it.
Mosaic Re-identification
Stripping the "18 Identifiers" is no longer sufficient. AI is a pattern-matching engine.
Example: A prompt containing "Welder" + "Zip Code 10001" + "Rare Sarcoma" can be cross-referenced with public news or case reports to identify a patient, even without a name.
Local LLMs (The "Air Gap")
For maximum security, hospitals are deploying open-weights models (like Llama 3 or Mistral) on internal servers. These run entirely offline. No data leaves the hospital firewall.
4. Prompting I: Few-Shot Learning
Zero-Shot Prompting (asking for a result with no examples) is the leading cause of poor formatting and vague answers.
Few-Shot Prompting involves providing 1-3 examples of the input and the desired output format. This "grounds" the model.
By seeing the examples, the model understands the "voice," the level of abbreviation expansion, and the exact structure required.
5. Prompting II: Advanced Reasoning
Chain of Thought (CoT)
Standard prompts ask for the answer immediately. CoT forces the model to "show its work." This reduces hallucination rates by 40-50% in reasoning tasks.
The "Let's think step by step" Magic
Simply appending the phrase "Let's think step by step" to a prompt can trigger CoT processing in the model's latent space.
Chain of Verification (CoV)
This is a self-correction loop. You instruct the model to generate an answer, and then critique itself.
Adversarial Prompting
To combat Sycophancy, assign the AI a "Critic" persona:
"I believe this is [Diagnosis X]. Act as a skeptical Attending Physician. Review my hypothesis and tell me why I might be WRONG. What 'Do Not Miss' diagnoses am I ignoring?"
6. RAG & Knowledge Retrieval
An LLM's training data is frozen in time (e.g., "Knowledge cutoff: Jan 2024"). It does not know about a drug recalled yesterday.
Retrieval Augmented Generation (RAG)
RAG is the architecture used by modern clinical AI tools. It works in two steps:
- Retrieve: The system searches a trusted database (e.g., UpToDate, Hospital Guidelines, PubMed) for relevant PDFs.
- Generate: It pastes those PDFs into the LLM's prompt and says: "Using ONLY the text below, answer the doctor's question."
Why RAG is Safer
It grounds the AI. If the answer isn't in the retrieved documents, a RAG system is programmed to say "I don't know," whereas a standard LLM will hallucinate an answer.
The Context Window Limit: Even with RAG, LLMs have a limit on how much text they can read. They suffer from "Lost in the Middle" syndrome—where they successfully retrieve info from the start and end of a long document, but miss details buried in the middle.
7. Failure Modes & Bias
Automation Bias
The human tendency to over-trust automated decision support systems. When an AI provides a differential diagnosis, clinicians stop brainstorming their own ideas. This narrows the cognitive field.
Rule: AI should be the second opinion, never the first.
Demographic Bias
LLMs are trained on the internet and historical medical data. Historical data contains biases (e.g., dermatology images primarily featuring lighter skin tones; pain management data showing racial disparities).
The "Default Male" Problem
Unless specified, LLMs often assume the patient is male. In cardiac presentations, this can lead to the model missing "atypical" (female) presentations of MI.
8. Final Certification Exam
The following exam consists of 10 randomized questions drawn from a master bank. You need 80% to pass.