Automated Classification of Pneumonia in Medical Radiography
Model by: Siri Suwannatee | BDATA 497: Computer Vision Techniques
Model Description
This model is a chest X-ray (CXR) image classifier that distinguishes between three classes: Normal, Bacterial Pneumonia, and Viral Pneumonia. It was developed as an AI-powered screening tool to prioritize high-risk cases for specialist review, helping reduce the time-to-decision in clinical workflows.
The model is intended to act as a triage assistant β flagging high-risk (pneumonia) cases for comprehensive expert review, while routing low-risk (normal) cases to standard review queues. It is not intended for standalone clinical diagnosis.
Training approach: Fine-tuned from YOLOv26n (Ultralytics classification head) on a balanced, 3-class chest X-ray dataset.
Intended use cases:
- Hospital or clinic screening pipelines to prioritize radiologist workload
- Academic/research exploration of CNN-based CXR classification
- Educational demonstrations of automated medical image triage
Training Data
Dataset Source
Chest X-Ray Images (Pneumonia) β published on Mendeley Data:
Kermany, D., Zhang, K., & Goldbaum, M. (2018). Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2
Number of Images and Classes
The original dataset contains 5,856 chest X-ray images in two classes: Normal and Pneumonia.
Annotation Process (Value Added)
This project used Annotation Option B: Pre-annotated single source with modifications. The original binary Pneumonia label was refined into two sub-categories β Bacterial and Viral β using publisher-provided file metadata. Image filenames in the source dataset encode the pneumonia type (e.g., person112_bacteria_539.jpeg, person1613_virus_2799.jpeg), which allowed programmatic re-labeling without manual annotation.
This refinement adds clinically meaningful granularity: bacterial and viral pneumonias have different treatment pathways, so distinguishing them is a valuable model capability.
Class Distribution
| Class | Count | % of Total |
|---|---|---|
| Normal | 1,583 | 27.0% |
| Bacterial Pneumonia | 2,780 | 47.5% |
| Viral Pneumonia | 1,493 | 25.5% |
| Total | 5,856 | 100% |
The dataset is imbalanced β Bacterial Pneumonia is overrepresented. To address this, the training experiments were run on both the imbalanced dataset and a balanced dataset (downsampled to 1,493 images per class, total 4,479 images).
Train / Validation / Test Split
| Split | Ratio |
|---|---|
| Train | 70% |
| Validation | 20% |
| Test | 10% |
Data Augmentation
No additional augmentation beyond resizing was applied. Images were resized to 640Γ640 pixels as required by the YOLO architecture.
Known Biases and Limitations in Training Data
- Pediatric bias: The source dataset was collected primarily from pediatric patients at Guangzhou Women and Children's Medical Center. Performance on adult populations may differ.
- Geographic/demographic bias: Single-institution data from China limits generalizability to other populations, imaging equipment, or acquisition protocols.
- Metadata-based annotation: The Bacterial/Viral split was derived from filename metadata rather than independent clinical re-annotation. Any labeling errors in the source dataset propagate into this model.
- Class imbalance: The raw dataset has ~1.86Γ more bacterial than viral pneumonia samples, which can bias model predictions toward the more common class if not corrected.
- No patient-level split: Images from the same patient may appear across train/validation/test sets, potentially inflating reported metrics.
Training Procedure
Training Framework
- Framework: PyTorch + Ultralytics
- Hardware: NVIDIA L4 GPU, 24 GB VRAM
- Training time: ~70 epochs per model (with early stopping)
Preprocessing
- Resize all images to 640Γ640 pixels
- No additional normalization or augmentation beyond framework defaults
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 0.0001 (1e-4) |
| Optimizer | Adam |
| Loss Function | CrossEntropy |
| Batch Size | 16 |
| Epochs | 70 |
| Early Stopping Patience | 12 (monitored on validation loss) |
Model Architectures Compared
Four lightweight architectures were trained and compared:
- MobileNet-V3
- EfficientNet-V2
- YOLOv11n (classification)
- YOLOv26n (classification) β Selected final model
Evaluation Results
Overall Performance Summary (Balanced Dataset)
| Model | Accuracy | Recall | Macro F1 | Precision |
|---|---|---|---|---|
| MobileNet-V3 | 0.74 | 0.76 | 0.73 | 0.79 |
| EfficientNet-V2 | 0.70 | 0.73 | 0.68 | 0.78 |
| YOLOv11n | 0.83 | 0.84 | 0.82 | 0.84 |
| YOLOv26n | 0.89 | 0.88 | 0.88 | 0.88 |
Min target = 0.80 on all metrics. Both YOLO models meet this threshold; MobileNet-V3 and EfficientNet-V2 fall short.
Detailed Per-Class Performance: YOLOv26n (Final Model)
| Class | Precision | Recall | F1 | Test Set Count |
|---|---|---|---|---|
| Bacterial | 0.91 | 0.94 | 0.93 | ~242 |
| Normal | 0.97 | 0.85 | 0.91 | ~234 |
| Viral | 0.74 | 0.84 | 0.79 | ~148 |
| Macro Avg | 0.88 | 0.88 | 0.88 | ~624 |
Confusion Matrix: YOLOv26n
| Pred: Bacterial | Pred: Normal | Pred: Viral | |
|---|---|---|---|
| True: Bacterial | 228 | 4 | 10 |
| True: Normal | 1 | 200 | 33 |
| True: Viral | 21 | 2 | 125 |
Inference Latency
All models ran well below the 100 ms target latency on the test hardware:
| Model | Inference Latency |
|---|---|
| MobileNet-V3 | 13.92 ms |
| EfficientNet-V2 | 14.83 ms |
| YOLOv11n | 15.88 ms |
| YOLOv26n | 14.00 ms |
Performance Analysis
YOLOv26n was selected as the final model based on highest accuracy (0.89), Macro F1 (0.88), and recall (0.88) on the balanced test set β all exceeding the 0.80 minimum target.
What the model does well:
- Bacterial Pneumonia is classified with high confidence (F1 = 0.93, Recall = 0.94). This is likely because bacterial pneumonia produces more visually distinct consolidation patterns in CXRs.
- Normal lungs are detected with very high precision (0.97), meaning when the model says "Normal," it is almost always correct. This is critical for a triage tool β false negatives (missed pneumonia) are more dangerous than false positives.
Where the model struggles:
- Viral Pneumonia is the weakest class (Precision = 0.74, F1 = 0.79, below the 0.80 target). The confusion matrix reveals that 21 Viral cases were misclassified as Bacterial. This is clinically plausible: early viral pneumonia produces subtle, diffuse interstitial patterns that are harder to distinguish from bacterial consolidation β even for human radiologists.
- Normal β Viral confusion: 33 Normal cases were predicted as Viral. This represents a false positive rate that could cause unnecessary specialist reviews, but is safer than missed pneumonia.
- Imbalanced data makes both YOLO models worse on average β the balanced dataset consistently improved performance across all four models, confirming that class imbalance was a meaningful problem.
Limitations and Biases
Known Failure Cases
- Viral Pneumonia misclassified as Bacterial: The model confuses 21 of 148 viral test cases (14%) as bacterial. In practice, both are pneumonia, so this is a severity-2 error (wrong subtype, correct disease category), not a severity-1 error (missed disease entirely).
- Normal X-rays with subtle findings: 33 Normal images were predicted as Viral Pneumonia. Images near the decision boundary β for example, mild atelectasis or pleural effusion in otherwise healthy patients β may trigger false positives.
Poor Performing Classes
Viral Pneumonia has below-target precision (0.74), meaning the model over-predicts this class. The likely cause is the visual similarity between early viral pneumonia (bilateral ground-glass opacity) and normal lung parenchyma with mild variation, as well as overlap with bacterial consolidation in more advanced cases.
Data Biases
- Pediatric population: Sourced exclusively from a children's hospital. Lung anatomy, disease presentation, and imaging protocols differ between pediatric and adult patients. Do not use this model on adult CXRs without further validation.
- Single institution / single scanner: Scanner brand, kVp settings, and image processing pipeline all affect CXR appearance. Out-of-distribution images may degrade performance significantly.
- Metadata-derived labels: The Bacterial/Viral annotation comes from filename metadata, not re-reviewed clinical records. Mislabeled source images directly impact model quality and evaluation metrics.
Environmental / Contextual Limitations
- Model assumes standard PA (posterior-anterior) chest X-ray orientation. Portable/AP views or rotated images may produce unreliable predictions.
- Performance on low-resolution or heavily compressed images has not been evaluated.
- Presence of medical devices (pacemakers, central lines, NG tubes) may confuse the classifier.
Inappropriate Use Cases
This model should NOT be used for:
- Standalone clinical diagnosis or as a replacement for radiologist review
- Adult patient populations (not validated)
- Emergency or acute care settings where false negatives carry life-threatening consequences
- Differentiating COVID-19 from other viral pneumonias (not trained on COVID data)
- Any deployment without physician oversight and institutional validation
Ethical Considerations
Medical AI tools carry significant ethical risk. This model is a research/educational prototype trained on a limited, non-diverse dataset. Deploying it in clinical settings without rigorous prospective validation, diverse population testing, and regulatory approval (e.g., FDA 510(k) clearance) would be inappropriate and potentially harmful. The model should never be used as the sole basis for a treatment decision.
Sample Size Limitations
- The Viral Pneumonia test set contains only ~148 images, making precision/recall estimates for this class statistically noisier than for Bacterial (242) or Normal (234).
- Further evaluation on an external, adult, multi-institution dataset is needed before any clinical consideration.
How to Use
The model was deployed as a Streamlit web application. Users can upload a JPG/PNG chest X-ray image and receive a predicted class (Normal, Bacterial, or Viral) along with a probability distribution across all three classes.
from ultralytics import YOLO
model = YOLO("path/to/yolo26n_cxr.pt")
results = model.predict("chest_xray.jpg")
print(results[0].probs) # Class probabilities
Citation
If you use this model, please cite the original dataset:
Kermany, D., Zhang, K., & Goldbaum, M. (2018). Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2
- Downloads last month
- -