Abstract
Introduction Generating clinical narrative summaries is a critical use case for the Military Health System. This study evaluates the performance of detailed versus vague prompts in generating summaries that yield accurate, bias-free, and meaningful outputs. Materials and Methods Eight anonymized large language models (LLMs) were evaluated by volunteer clinicians for clinical narrative summary generation from October through November of 2024. Eight clinical scenarios were presented to each model with structured and vague prompt variations, and the responses were evaluated for conformance, task adequacy, hallucinations, and bias. Results Both approaches were comparable in conforming to instructions (64%). The structured approach was more likely to produce a clinically adequate summary (39% compared to 9%) but also more likely to introduce hallucinations and bias. Discussion The structured prompt, following best practices for prompt engineering, produced a superior response but was prone to hallucinations. This could be mitigated with additional tuning. None of the models tested reliably produced clinically usable summaries. Conclusion Efficient generation of clinical summaries is critical for the Military Health System. Using a structured prompt that employs role, task, tone, and format increases the output quality. Institutions seeking to use LLMs to summarize clinical notes may have more success with a structured approach but need to be cautious in ensuring the summary itself is true to its source documents.
| Original language | English |
|---|---|
| Pages (from-to) | e445-e448 |
| Journal | Military Medicine |
| Volume | 191 |
| Issue number | 1-2 |
| DOIs | |
| State | Published - 1 Jan 2026 |
Fingerprint
Dive into the research topics of 'Prompting Pro Tips! Best Practices for Generating Clinical Narrative Summaries'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver