The region classifier used the Resnet-18 convolutional neural network model pretrained on ImageNet and was fine-tuned on the new dataset designed for anatomical region classification within the manikin. The final linear layer of the Resnet-18 was replaced with a new linear layer with six outputs for the six anatomical regions. Models were trained using a sweep of hyperparameters of frozen layers, learning rate, image transforms, and batch size.
Explore This Issue
November 2025Anatomical Structure Detector
The anatomical structure object detector is used to identify key anatomical structures in the manikin and place a bounding box around each structure. The anatomical structures labeled were inferior turbinate, middle turbinate, uvula, vallecula, epiglottis, and vocal folds. In addition, the path we desired for the user to take in the nasal cavity was labeled into two separate classes, one for the path leading up to the middle turbinate (Path 1) and one for the path after passing the proximal end of the inferior turbinate (Path 2). Images were labeled with bounding boxes by trained graduate students using Label Studio, a data annotation tool. Some of the structures did not have apparent boundaries with clear delineations, which led to noisy bounding box labels. A confusion matrix was generated for the anatomical region classifier.
The anatomical structure detector used a YOLOv7 model that was fine-tuned on a dataset made of 11,337 images from a subset of 16 videos and evaluated against 3,096 images from four videos. No validation set was used due to the time-intensive nature of labeling the data. Models were trained using a hyperparameter sweep of image size, learning rate, image transforms, and batch size. Because multiple structures could be identified within a single frame, predicting the maximum likelihood of a class was not viable when integrating the anatomical structure detector into the AI Copilot. Instead, each class had its own confidence threshold that was hand-tuned based on human judgment after interacting with the complete system.
The performance of the model was measured using mean average precision (mAP). Mean average precision is a standard metric used in object detection and has the benefit of balancing precision and recall. An intersection-over-union threshold of 0.5 was used to calculate the mAP because the ground truth bounding box labels for some classes were noisy.
The AI Copilot was pilot tested prospectively by having 64 medical students naïve to FFL use the AI Copilot to perform FFL on the AirSim Combo Bronchi X manikin (Fig. 1) (United Kingdom, TruCorp Ltd). Anonymous surveys were handed out to the medical students after they had performed the FFL, asking them to rate the ease of using machine learning Copilot during FFL and self-rate their FFL skills with and without the Copilot, both on a 5-point Likert Scale. Descriptive statistics were used to analyze medical student responses to ease of use of the tool, and their subjective skill set before and after use of the tool. This was a proof-of-concept study to test the feasibility of the AI Copilot. The authors plan to do a formal study evaluating the impact of the AI Copilot on novice learners.
Leave a Reply