Sammamish Spawning Salmon Spotter V1.0
Scientific Context
Kokanee salmon are a culturally and ecologically significant ecotype of sockeye salmon present in Lake Sammamish and Lake Washington watersheds. Their nonandromous life cycles and later spawning than other Pacific salmon species in the area allows them to fill a unique ecological niche, providing food and nutrients to the ecosystem in winter. Additionally, their permanent residence in freshwater ecosystems makes them susceptible to degradation of water quality and an excellent benchmark of the health of the ecosystem; the freshwater environment also provides less nutrients than the ocean, leading to smaller size and vivid pigmentation being the main forms of kokanee identification. The late-season runs of kokanee salmon were a critical food source for the Snoqualmie Tribe who were able to reside in their ancestral homelands year-round due to the late-season access to kokanee, known in Snoqualmie Lushootseed as ʔilaʔł, meaning "little red fish."
However, rapid urbanization of the watersheds kokanee call home in the Puget Sound region has caused severe declines in their population. In the last two decades, there have frequently been annual run counts of kokanee below 1000, and in the 2017-2018 season, that count dropped as low as 19. In recent years though, counts were as high as 8300 kokanee salmon in the Lake Sammamish watershed. After their near-extinction, public outcry and riparian restoration efforts have been conducted throughout the watershed, and there is a very real potential that the increased ecological restoration of kokanee habitat is having a positive impact on the population. This YOLOv11 computer vision model aims to gain count estimates of sockeye and kokanee salmon from overhead aerial footage taken from drone survey of creeks in the Lake Sammamish watershed.
Dataset Description
Data was a series of frames sampled from six videos provided by Dr. Jeffrey Jensen in a drone survey he conducted of the Sammamish River near the mouth of Little Bear Creek (47°45'18.04"N 122°10'08.54”W) during the 2023 spawning season. All videos were either in the .mov or .mp4 format, ranging from 00:24 seconds to 01:22 minutes in length; the .mov videos were in a raw image format with adjustments for higher contrast and saturation and the .mp4 was more similar to what the naked eye would see. Each .mov video was paired with a .mp4 video, so in essence there were only three unique videos of salmon spawning. This set of imagery presented challenges including crossing the optical reflective barrier of the surface of the water, leading to ripples and distortion. Additionally, many of the sampled frames had a low number of salmon and poor image quality. The shortest video had the highest quality of imagery, so I sampled it at double the rate of the other imagery. These images average on over 150 annotations per frame and were very time consuming. Initially, I had a dataset of 190 images. I annotated 113 images manually, resulting in 2110 annotations, and used a YOLOv11 trained on the annotated imagery to make inferences on the remaining images, which were the high-quality, high-volume of salmon images I had not been able to complete annotation on. This resulted in 18,252 annotations on the remaining 77 images. I was not able to clean up the inferences, which would be the next step before launching v2.0 of this model. After augmentations, image count was 299.
Original Video Resolution: 1080p
Preprocessing: resize imagery to 640 x 640
Augmentations: 90° Rotate: Clockwise, Counter-Clockwise, Shear: ±14° Horizontal, ±8° Vertical, Saturation: Between -15% and +15%, Exposure: Between -15% and +15%, Blur: Up to 0.7px, Noise: Up to 0.58% of pixels
Classes Count Oncorhynchus nerka 2,110 Needs Review 18,252
An example of high-quality imagery hand-annotated with the O. Nerka class
An example of low-quality imagery hand-annotated with the O. Nerka class showcasing many of the challenges of the dataset
An example of low-accuracy inferences made by the YOLOv11 model on high-quality imagery with the Needs Review class
An example of raw-format .mov imagery annotated using YOLOv11
Model Selection
An object detection model was used for this use case due to its simpler implementation, lower GPU burden, and utility as a proof-of-concept for this project. The primary objective of the model was salmon identification. The first batch of images, annotated as O. Nerka, were then fed into a YOLOv11 model which made inferences on the remaining unalyzed imagery. This was accomplished by zipping files into my personal drive then unzipping them into colab and utilized code snippets from OceanCV 3.15 and 3.17.
Next steps for this project include cleaning up the YOLO-generated annotations in the "Needs Review" class then running the model again, and if that doesn't increase model statistics significantly, starting from scratch relying heavier on good-quality imagery for first round of training. Once accuracy is improved, adding a tracking-in-frame or built-in counting feature, testing an instance segmentation version of this model, and comparing it with other salmon tracking models using a side-view and an underwater camera.
Model Assessment
F1 Curve
With an F1 curve peak at 0.49, this indicates that the model is sometimes able to correctly identify salmon but also struggles with its precision and/or recall. This means the model is missing some positive instances and/or categorizing negatives instances as positives. The F1 score could be worse, but it could also be improved through further training. P Curve
This curve shows the model can correctly identify salmon from imagery, but its lower integrated area indicates that the model struggles with creating false positives. Given the similarity overhead of a sockeye salmon to a water ripple, the presence of false positives does not surprise me. The dip around 0.8 before the spike up to 1.0 indicates that the model may struggle with overfitting at higher confidence thresholds.
R Curve
The lower initial value of the R-curve shows the model can correctly identify some salmon, but it struggles with edge cases and misses approximately a third of positive salmon instances even at low confidence thresholds. The smaller integrated area under the curve indicates that false negatives are a substantial problem with model output and that, as I suspected, the model likely struggles to differentiate between sockeye salmon and water ripples. This surprised me, as when I looked at the imagery while annotating, I assumed false positives would be a bigger issue.P-R Curve
The integrated area under the P-R curve indicates mediocre model performance. Confusion Matrix
The confusion matrix shows that more O. nerka instances were identified as background than correctly identified as O. nerka, which shows the issue with a high rate of false negatives seen in the R curve. There is also a significant portion of type 1 errors, but less than correctly-identified instances, and less than type 2 errors.False negatives are a serious issue with model performance, and further training needs to be conducted to try and reduce this form of error within the model before it can be deployed in the field.
Model Use Case
When ready, this model could be used to help monitor presence of kokanee salmon in local watersheds from aerial footage taken by drone. It could also potentially help differentiate between kokanee and sockeye salmon through size estimation, and if a counter feature and track-in-frame is added to later model versions, it could also be used as a salmon counter.
A potential study that could be conducted with this model would be to test the efficacy of restoration efforts surrounding the Lake Sammamish Watershed through species counts, and a potential null hypothesis for such a study could be, "H
o: There is no difference in instances of O. nerka caught by a CV counting model at local sites with kokanee salmon restoration work vs sites without similar work."If this computer vision model becomes more accurate with further development, this could alleviate burdens on local citizen scientists and researchers, provide evidence for the success of local kokanee restoration projects in the Lake Sammamish Watersheds and inspire the acceleration of simillar projects around Lake Washington, and help find areas of success for kokanee restoration and areas of concern, leading to more effective restoration ecology work.
Attributions
- Permission to use this footage was granted by Dr. Jeffrey Jensen, affiliated with UW Bothell and Salmon Watchers. For further inquiries regarding imagery and local kokanee restoration efforts, he can be contacted at jsjensen@uw.edu.