• Home
  • Practice Focus
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
    • How I Do It
    • TRIO Best Practices
  • Business of Medicine
    • Health Policy
    • Legal Matters
    • Practice Management
    • Tech Talk
    • AI
  • Literature Reviews
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
  • Career
    • Medical Education
    • Professional Development
    • Resident Focus
  • ENT Perspectives
    • ENT Expressions
    • Everyday Ethics
    • From TRIO
    • The Great Debate
    • Letter From the Editor
    • Rx: Wellness
    • The Voice
    • Viewpoint
  • TRIO Resources
    • Triological Society
    • The Laryngoscope
    • Laryngoscope Investigative Otolaryngology
    • TRIO Combined Sections Meetings
    • COSM
    • Related Otolaryngology Events
  • Search

How To: Automated Real-Time Otologic Drill Motion Analysis

by Obinna I. Nwosu, MD, Mitsuki Ota, BS, Lucy J. Xu, MD, MHQS, and Matthew G. Crowson, MD • April 3, 2025

  • Tweet
  • Email
Print-Friendly Version

INTRODUCTION

Existing assessment instruments for evaluating performance in mastoidectomy are tedious and require extensive expert input, which may not always be available. Computer-based approaches have been employed to analyze instrument motion and, in turn, use motion as a proxy for skill. When coupled with artificial intelligence (AI) techniques, these approaches can be automated, making assessment more efficient, objective, and scalable.

You Might Also Like

  • Literature Review: A Roundup of Important Recent Studies
  • Here Are Some New Advances in Sinus Navigation Systems
  • Novel Documentation of Supplemental Margins Using iPad Annotation of 3D Surgical Models with Procreate
  • Digital Transformation, Wearable Technology, and Physician Wellbeing
Explore This Issue
April 2025

Recent efforts have employed computer vision (CV) for automatically tracking instruments during mastoidectomy; however, these approaches have only been employed in post-hoc video analysis. Automated post-hoc analysis remains a time-consuming process that is temporally separated from the surgical experience and requires an asynchronous large data transfer.

Implementing motion analysis in real time could afford users immediate information on their surgical motion. Realizing the full potential of AI and CV in otolaryngology will require real-time integration of these tools. Although the body of CV applications in otolaryngology has grown recently, there have been no published descriptions of how to implement these tools for real-time use. To that end, we developed a CV approach for real-time drill motion analysis during mastoidectomy on a 3D-printed temporal bone (TB) and detailed the technical setup required for real-time implementation.

METHODS

Dataset Preparation

An image dataset was extracted from videos of mastoidectomies performed by the lead author on 3D-printed TBs. Videos were obtained using a high-definition camera (Karl Storz SE & Co KG, Tuttlingen, Germany) mounted to a Zeiss microscope (Carl Zeiss, Oberkochen, Baden-Württemberg, Germany). Frames were randomly extracted from the video dataset at 720 × 480 resolution using a custom Python script. Images in which a drill was not visualized were excluded. The drill burr was manually annotated with a bounding box in each image to create ground truth annotations. Ground truth annotations represent the true location of the object of interest. Object detection models are assessed by comparing model detections to ground truth annotations. Diamond and cutting drill burrs of variable sizes were included in the dataset.

Figure 1: Illustration of the setup for real-time tracking. The live video data from the endoscope, which is mounted to the microscope, is transferred via the capture card into a laptop running the motion analysis model in real time (Illustration by Lucy Xu, MD). Laryngoscope. https://doi.org/10.1002/lary.31795

Two authors split and annotated half of the dataset to avoid bias. The senior author performed an independent review of a random subset of annotations from each author to ensure no discrepancies in annotation patterns. The dataset was divided into training and testing subsets, 80% and 20%, respectively. Our approach focused exclusively on drill detection, as the suction is often occluded by resin dust and poorly visualized in 3D-printed resin TB drilling.

Model Development and Evaluation

The training subset was passed into an open-source CV library, YOLOv8 (Ultralytics, https://github.com/ultralytics/ultralytics), to train an object detection model. The trained model was then tasked with detecting and localizing the drill burr in test data.

Model performance is assessed on the test image subset, which was not encountered during training. An object detection model aims to localize and appropriately classify the object of interest within an image. Performance is assessed on classification accuracy and precision of localization. We utilized common CV performance metrics of recall, intersection over union (IoU), average precision (AP), and mean average precision (mAP). Recall describes the model’s classification accuracy—that is, how likely the model is to predict a drill in an image that truly contains a drill. IoU, a threshold metric, represents the degree of overlap between ground truth and model-predicted bounding boxes. The AP of the model is a measure of how likely a predicted bounding box is to contain the same image data as the ground truth bounding box. AP is reported at a specified IoU threshold. mAP is given by averaging AP values across a series of IoU thresholds. We report our model’s mAP (0.5:0.95), a common CV metric, denoting an average of AP values across 10 IoU thresholds from 0.5 to 0.95 at 0.05 intervals.

The model was trained and evaluated in a private code notebook on Google Colaboratory, an online hosted computing service. Training and evaluation were powered by an NVIDIA Tesla T4 GPU.

Drill Tracking and Motion Analysis

In contrast to object detection, where an object is located in independent frames, object tracking involves the maintenance of an object’s identity across a sequence of frames (i.e., video). To track the motion of the drill, the detection model was coupled to a multi-object tracking algorithm, Boosted Simple Online and Realtime Tracking (BoT-SORT, https://github.com/NirAharon/BoT-SORT). BoT-SORT utilizes appearance matching between detections and motion filtering to maintain an object’s identity throughout a video stream.

Temporary drill occlusions, where the drill burr is partially obscured (i.e., by the suction), are overcome by tracking algorithms within BoT-SORT such that the drill can still be localized even if it is only partially in view. In each video frame, the model plots a bounding box around the predicted location of the drill burr and extracts the bounding box’s center coordinate. To analyze motion, center coordinates are implemented into kinematic equations. Motion metrics computed in our model include instrument velocity, distance traveled, smoothness of motion, and total task duration. These metrics were selected given their relationship to efficiency of motion, specifically within the context of mastoidectomy. Metrics are recorded cumulatively and individually for each drill burr type used.

Technical Setup for Real-Time Motion Analysis

The endoscope is mounted to the operating microscope and visualizes the microscopic view. The endoscope’s video feed, captured at 30 frames per second, is transferred via a DVI-HDMI cable to a video capture card, Elgato HD60X (Corsair Gaming, Milpitas, Calif.), which connects to a laptop running the motion analysis model (Fig. 1). The capture card effectively converts the microscopic view into a live video stream, which can be input, frame by frame, into the model.

The model runs locally on an M3 MacBook Pro (Apple Inc., Cupertino, Calif.), with inferencing powered by GPU via Apple Silicon Metal Performance Shaders.

This study protocol was deemed exempt by the Mass General Brigham Institutional Review Board (Protocol: 2020P003439).

RESULTS

A total of 18,483 frames were extracted from 15 mastoidectomy recordings. After removing frames without a drill, the final image dataset included 7,986 images. The training and testing subsets contained 6,387 and 1,597 images and associated annotations, respectively.

When evaluated on the test subset, the detection model achieved a recall of 98.4% and a mAP (0.5:0.95) of 81.5%.

In a mastoidectomy performed on a 3D-printed TB, the model could reliably analyze instrument motion in real time during the entirety of the procedure. In the described computing environment, the model took, on average, 30 milliseconds to localize the drill in each frame of the live video, amounting to minimal appreciable delay. When terminated by the user, the motion analysis model provided a summative feedback report detailing motion metrics and task duration.  

Pages: 1 2 | Multi-Page

Filed Under: How I Do It, Otology/Neurotology Tagged With: AI techniques, computer vision, CV, MastoidectomyIssue: April 2025

You Might Also Like:

  • Literature Review: A Roundup of Important Recent Studies
  • Here Are Some New Advances in Sinus Navigation Systems
  • Novel Documentation of Supplemental Margins Using iPad Annotation of 3D Surgical Models with Procreate
  • Digital Transformation, Wearable Technology, and Physician Wellbeing

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The Triological SocietyENTtoday is a publication of The Triological Society.

Polls

Have you invented or patented something that betters the field of otolaryngology?

View Results

Loading ... Loading ...
  • Polls Archive

Top Articles for Residents

  • Applications Open for Resident Members of ENTtoday Edit Board
  • How To Provide Helpful Feedback To Residents
  • Call for Resident Bowl Questions
  • New Standardized Otolaryngology Curriculum Launching July 1 Should Be Valuable Resource For Physicians Around The World
  • Do Training Programs Give Otolaryngology Residents the Necessary Tools to Do Productive Research?
  • Popular this Week
  • Most Popular
  • Most Recent
    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • The Best Site for Pediatric TT Placement: OR or Office?

    • The Road Less Traveled—at Least by Otolaryngologists

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • Is Middle Ear Pressure Affected by Continuous Positive Airway Pressure Use?

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • Complications for When Physicians Change a Maiden Name

    • Leaky Pipes—Time to Focus on Our Foundations
    • You Are Among Friends: The Value Of Being In A Group
    • How To: Full Endoscopic Procedures of Total Parotidectomy
    • How To: Does Intralesional Steroid Injection Effectively Mitigate Vocal Fold Scarring in a Rabbit Model?
    • What Is the Optimal Anticoagulation in HGNS Surgery in Patients with High-Risk Cardiac Comorbidities?

Follow Us

  • Contact Us
  • About Us
  • Advertise
  • The Triological Society
  • The Laryngoscope
  • Laryngoscope Investigative Otolaryngology
  • Privacy Policy
  • Terms of Use
  • Cookies

Wiley

Copyright © 2025 by John Wiley & Sons, Inc. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies. ISSN 1559-4939