Automated Classification of Pneumonia in Medical Radiography

Model by: Siri Suwannatee | BDATA 497: Computer Vision Techniques

Model Description

This model is a chest X-ray (CXR) image classifier that distinguishes between three classes: Normal, Bacterial Pneumonia, and Viral Pneumonia. It was developed as an AI-powered screening tool to prioritize high-risk cases for specialist review, helping reduce the time-to-decision in clinical workflows.

The model is intended to act as a triage assistant β€” flagging high-risk (pneumonia) cases for comprehensive expert review, while routing low-risk (normal) cases to standard review queues. It is not intended for standalone clinical diagnosis.

Training approach: Fine-tuned from YOLOv26n (Ultralytics classification head) on a balanced, 3-class chest X-ray dataset.

Intended use cases:

  • Hospital or clinic screening pipelines to prioritize radiologist workload
  • Academic/research exploration of CNN-based CXR classification
  • Educational demonstrations of automated medical image triage

Training Data

Dataset Source

Chest X-Ray Images (Pneumonia) β€” published on Mendeley Data:

Kermany, D., Zhang, K., & Goldbaum, M. (2018). Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2

Number of Images and Classes

The original dataset contains 5,856 chest X-ray images in two classes: Normal and Pneumonia.

Annotation Process (Value Added)

This project used Annotation Option B: Pre-annotated single source with modifications. The original binary Pneumonia label was refined into two sub-categories β€” Bacterial and Viral β€” using publisher-provided file metadata. Image filenames in the source dataset encode the pneumonia type (e.g., person112_bacteria_539.jpeg, person1613_virus_2799.jpeg), which allowed programmatic re-labeling without manual annotation.

This refinement adds clinically meaningful granularity: bacterial and viral pneumonias have different treatment pathways, so distinguishing them is a valuable model capability.

Class Distribution

Class Count % of Total
Normal 1,583 27.0%
Bacterial Pneumonia 2,780 47.5%
Viral Pneumonia 1,493 25.5%
Total 5,856 100%

The dataset is imbalanced β€” Bacterial Pneumonia is overrepresented. To address this, the training experiments were run on both the imbalanced dataset and a balanced dataset (downsampled to 1,493 images per class, total 4,479 images).

Train / Validation / Test Split

Split Ratio
Train 70%
Validation 20%
Test 10%

Data Augmentation

No additional augmentation beyond resizing was applied. Images were resized to 640Γ—640 pixels as required by the YOLO architecture.

Known Biases and Limitations in Training Data

  • Pediatric bias: The source dataset was collected primarily from pediatric patients at Guangzhou Women and Children's Medical Center. Performance on adult populations may differ.
  • Geographic/demographic bias: Single-institution data from China limits generalizability to other populations, imaging equipment, or acquisition protocols.
  • Metadata-based annotation: The Bacterial/Viral split was derived from filename metadata rather than independent clinical re-annotation. Any labeling errors in the source dataset propagate into this model.
  • Class imbalance: The raw dataset has ~1.86Γ— more bacterial than viral pneumonia samples, which can bias model predictions toward the more common class if not corrected.
  • No patient-level split: Images from the same patient may appear across train/validation/test sets, potentially inflating reported metrics.

Training Procedure

Training Framework

  • Framework: PyTorch + Ultralytics
  • Hardware: NVIDIA L4 GPU, 24 GB VRAM
  • Training time: ~70 epochs per model (with early stopping)

Preprocessing

  • Resize all images to 640Γ—640 pixels
  • No additional normalization or augmentation beyond framework defaults

Hyperparameters

Parameter Value
Learning Rate 0.0001 (1e-4)
Optimizer Adam
Loss Function CrossEntropy
Batch Size 16
Epochs 70
Early Stopping Patience 12 (monitored on validation loss)

Model Architectures Compared

Four lightweight architectures were trained and compared:

  1. MobileNet-V3
  2. EfficientNet-V2
  3. YOLOv11n (classification)
  4. YOLOv26n (classification) ← Selected final model

Evaluation Results

Overall Performance Summary (Balanced Dataset)

Model Accuracy Recall Macro F1 Precision
MobileNet-V3 0.74 0.76 0.73 0.79
EfficientNet-V2 0.70 0.73 0.68 0.78
YOLOv11n 0.83 0.84 0.82 0.84
YOLOv26n 0.89 0.88 0.88 0.88

Min target = 0.80 on all metrics. Both YOLO models meet this threshold; MobileNet-V3 and EfficientNet-V2 fall short.

Detailed Per-Class Performance: YOLOv26n (Final Model)

Class Precision Recall F1 Test Set Count
Bacterial 0.91 0.94 0.93 ~242
Normal 0.97 0.85 0.91 ~234
Viral 0.74 0.84 0.79 ~148
Macro Avg 0.88 0.88 0.88 ~624

Confusion Matrix: YOLOv26n

Pred: Bacterial Pred: Normal Pred: Viral
True: Bacterial 228 4 10
True: Normal 1 200 33
True: Viral 21 2 125

Inference Latency

All models ran well below the 100 ms target latency on the test hardware:

Model Inference Latency
MobileNet-V3 13.92 ms
EfficientNet-V2 14.83 ms
YOLOv11n 15.88 ms
YOLOv26n 14.00 ms

Performance Analysis

YOLOv26n was selected as the final model based on highest accuracy (0.89), Macro F1 (0.88), and recall (0.88) on the balanced test set β€” all exceeding the 0.80 minimum target.

What the model does well:

  • Bacterial Pneumonia is classified with high confidence (F1 = 0.93, Recall = 0.94). This is likely because bacterial pneumonia produces more visually distinct consolidation patterns in CXRs.
  • Normal lungs are detected with very high precision (0.97), meaning when the model says "Normal," it is almost always correct. This is critical for a triage tool β€” false negatives (missed pneumonia) are more dangerous than false positives.

Where the model struggles:

  • Viral Pneumonia is the weakest class (Precision = 0.74, F1 = 0.79, below the 0.80 target). The confusion matrix reveals that 21 Viral cases were misclassified as Bacterial. This is clinically plausible: early viral pneumonia produces subtle, diffuse interstitial patterns that are harder to distinguish from bacterial consolidation β€” even for human radiologists.
  • Normal β†’ Viral confusion: 33 Normal cases were predicted as Viral. This represents a false positive rate that could cause unnecessary specialist reviews, but is safer than missed pneumonia.
  • Imbalanced data makes both YOLO models worse on average β€” the balanced dataset consistently improved performance across all four models, confirming that class imbalance was a meaningful problem.

Limitations and Biases

Known Failure Cases

  • Viral Pneumonia misclassified as Bacterial: The model confuses 21 of 148 viral test cases (14%) as bacterial. In practice, both are pneumonia, so this is a severity-2 error (wrong subtype, correct disease category), not a severity-1 error (missed disease entirely).
  • Normal X-rays with subtle findings: 33 Normal images were predicted as Viral Pneumonia. Images near the decision boundary β€” for example, mild atelectasis or pleural effusion in otherwise healthy patients β€” may trigger false positives.

Poor Performing Classes

Viral Pneumonia has below-target precision (0.74), meaning the model over-predicts this class. The likely cause is the visual similarity between early viral pneumonia (bilateral ground-glass opacity) and normal lung parenchyma with mild variation, as well as overlap with bacterial consolidation in more advanced cases.

Data Biases

  • Pediatric population: Sourced exclusively from a children's hospital. Lung anatomy, disease presentation, and imaging protocols differ between pediatric and adult patients. Do not use this model on adult CXRs without further validation.
  • Single institution / single scanner: Scanner brand, kVp settings, and image processing pipeline all affect CXR appearance. Out-of-distribution images may degrade performance significantly.
  • Metadata-derived labels: The Bacterial/Viral annotation comes from filename metadata, not re-reviewed clinical records. Mislabeled source images directly impact model quality and evaluation metrics.

Environmental / Contextual Limitations

  • Model assumes standard PA (posterior-anterior) chest X-ray orientation. Portable/AP views or rotated images may produce unreliable predictions.
  • Performance on low-resolution or heavily compressed images has not been evaluated.
  • Presence of medical devices (pacemakers, central lines, NG tubes) may confuse the classifier.

Inappropriate Use Cases

This model should NOT be used for:

  • Standalone clinical diagnosis or as a replacement for radiologist review
  • Adult patient populations (not validated)
  • Emergency or acute care settings where false negatives carry life-threatening consequences
  • Differentiating COVID-19 from other viral pneumonias (not trained on COVID data)
  • Any deployment without physician oversight and institutional validation

Ethical Considerations

Medical AI tools carry significant ethical risk. This model is a research/educational prototype trained on a limited, non-diverse dataset. Deploying it in clinical settings without rigorous prospective validation, diverse population testing, and regulatory approval (e.g., FDA 510(k) clearance) would be inappropriate and potentially harmful. The model should never be used as the sole basis for a treatment decision.

Sample Size Limitations

  • The Viral Pneumonia test set contains only ~148 images, making precision/recall estimates for this class statistically noisier than for Bacterial (242) or Normal (234).
  • Further evaluation on an external, adult, multi-institution dataset is needed before any clinical consideration.

How to Use

The model was deployed as a Streamlit web application. Users can upload a JPG/PNG chest X-ray image and receive a predicted class (Normal, Bacterial, or Viral) along with a probability distribution across all three classes.

from ultralytics import YOLO

model = YOLO("path/to/yolo26n_cxr.pt")
results = model.predict("chest_xray.jpg")
print(results[0].probs)  # Class probabilities

Citation

If you use this model, please cite the original dataset:

Kermany, D., Zhang, K., & Goldbaum, M. (2018). Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train cvtechniques/CXR-Pneumonia-Classification