• Home
  • Practice Focus
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
    • How I Do It
    • TRIO Best Practices
  • Business of Medicine
    • Health Policy
    • Legal Matters
    • Practice Management
    • Tech Talk
    • AI
  • Literature Reviews
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
  • Career
    • Medical Education
    • Professional Development
    • Resident Focus
  • ENT Perspectives
    • ENT Expressions
    • Everyday Ethics
    • From TRIO
    • The Great Debate
    • Letter From the Editor
    • Rx: Wellness
    • The Voice
    • Viewpoint
  • TRIO Resources
    • Triological Society
    • The Laryngoscope
    • Laryngoscope Investigative Otolaryngology
    • TRIO Combined Sections Meetings
    • COSM
    • Related Otolaryngology Events
  • Search

How To: Using New iPhone Application for Voice Quality Assessment Based on the GRBAS Scale

by Tsuyoshi Kojima, MD, PhD, Koki Hasebe, MD, Shintaro Fujimura, MD, Yusuke Okanoue, MD, Hiroki Kagoshima, MD, Atsushi Taguchi, MD, Hirotaka Yamamoto, MD, Kazuhiko Shoji, MD, PhD, and Ryusuke Hori, MD, PhD • March 16, 2021

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
Print-Friendly Version

Introduction

Auditory-perceptual voice analysis reliably quantifies overall voice quality both in clinical practice and in research settings. Perceptual evaluation is rapid, noninvasive, readily performed, and does not require specialized equipment. Such evaluation is often used to assess the hoarseness of new patients or those undergoing follow-up. The grade, roughness, breathiness, asthenia, and strain (GRBAS) scale is widely used for perceptual evaluation of voice quality (Hirano M. Psycho-acoustic evaluation of voice. Disorders of Human Communication, 5, Clinical Examination of Voice. New York, NY: Springer-Verlag; 1981:81–84). We previously suggested that artificial intelligence (AI) could objectively and effectively classify pathological voice data using this scale (J Voice [published online ahead of print March 13, 2020]. doi:10.1016/j.jvoice.2020.02.009). The deep-learning architecture for voice quality classification is based on TensorFlow, a software library developed and released by Google for use in deep learning. Perceptual ratings are subjective since their accuracy depends on the skill, expertise, and psychophysical status of the evaluator (J Speech Hear Res. 1990;33:103–115), as well as the evaluation time, thus biasing the results. AI eliminates these issues and enables objective assessment.

You Might Also Like

  • The Impact of Thyroidectomy on Vocal Quality Characteristics
  • A Look at Baseline Characteristics of Transgender Patients seeking Gender-Affirming Voice Treatment
  • Extent of ELS Resections Determines Vocal Quality Following Transoral Laser Microsurgery
  • Professional Voice Care May Reduce Vocal Disorders in Children
Explore This Issue
March 2021
Fig. 1. iPhone application named “GRBASZero.” White arrow: Displayed five voice evaluations in real-time. Black arrow: Tap on the waveform for checking the assessments.

Fig. 1. iPhone application named “GRBASZero.” White arrow: Displayed five voice evaluations in real-time. Black arrow: Tap on the waveform for checking the assessments.

The purpose of this research is to facilitate the use of a deep-learning architecture with the GRBAS scale in clinical practice. To this end, we created an iPhone application named “GRBASZero” to evaluate the voice easily in real time. The deep-learning architecture was prepared with the aid of Create ML and ported to the iPhone using Core ML. Apple provides Create ML and Core ML for the creation of iPhone applications via machine-learning. We initially explored whether Create ML was still the most appropriate software in the present research context.

Method

The subjective assessments of voice quality used the GRBAS scale, which grades voice pathology on a four-point scale (0 = normal to 3 = severe). We constructed a dataset containing 1,377 samples of the sustained vowel /a/ recorded during acoustic analysis of patients’ voices to create a simple machine-learning model that assesses slight temporal changes in voice quality in real time. Voice disorders are caused by vocal-cord polyps, nodules, cysts, atrophy, paralysis, and cancer, as well as by laryngitis. Voices were recorded in a soundproof room as 16-bit/48-kHz WAV files. Three experts rated each sample using the GRBAS scale, and median values were calculated.

We used Create ML to devise and train a machine-learning model using the labeled sounds. MLSoundClassifier was used to classify the voice data. The training dataset was sorted into four classes labeled G0, G1, G2, and G3 using the G data. Similar datasets were created for R, B, A, and S. The training voices ranged in duration from 0.998–8.742 s. During preprocessing, the data were resampled at 16,000 samples/s for 0.975 seconds and divided into several overlapping windows on which Hamming windows were then overlaid and power spectra were calculated via fast Fourier transforms from 125–7500 Hz. The data were filtered using a Mel Frequency Filter Bank, and natural logarithms were calculated. The pre-trained convolutional neural network Google VGGish was then used for feature extraction (https://research.google/pubs/pub45611/). VGGish features 17 convolution/activation layers. The top three layers were removed and replaced with a custom neural network based on the input data. After training, the model was saved as a trained Core ML file, and SoundAnalysis was then used to analyze and classify streamed or file-based voices.

The trained Core ML model was integrated into an iPhone application for voice evaluation in real time (Fig. 1). The application contains trained datasets for G, R, B, A, and S, displays these five voice evaluations in real time, and retains the assessments for later viewing. The system evaluates only sounds above a certain pressure; silence is ignored. Our “GRBASZero” application is available at no cost in the Apple Store.

The purpose of this research is to facilitate the use of a deep-learning architecture with the GRBAS scale in clinical practice. 

Results

During training, Create ML randomly splits data into training and validation sets. The model learns iteratively from the training set, and during each iteration it uses the validation set to check its accuracy. We averaged the training and validation scores of five training sessions; the training datasets were randomly chosen and differed for each session. The metrics for the G scale showed high accuracy for the training data (0.806 SD 0.013). The model also had relatively high accuracy for the R scale (0.812 SD 0.008). Among the five categories, accuracy was lowest for the B scale (0.722 SD 0.016) and highest for the S scale (0.914 SD 0.005). The model had acceptable accuracy for the A scale (0.777 SD 0.010). The application was easy to use, requiring only an iPhone. Each score was displayed for 0.975 s. Although phonations of less than 0.975 s were difficult to evaluate correctly, the evaluations were stable when phonation was stable. However, any noise in the examination room destabilized assessment. 

Pages: 1 2 | Single Page

Filed Under: How I Do It, Laryngology Tagged With: clinical best practicesIssue: March 2021

You Might Also Like:

  • The Impact of Thyroidectomy on Vocal Quality Characteristics
  • A Look at Baseline Characteristics of Transgender Patients seeking Gender-Affirming Voice Treatment
  • Extent of ELS Resections Determines Vocal Quality Following Transoral Laser Microsurgery
  • Professional Voice Care May Reduce Vocal Disorders in Children

The Triological SocietyENTtoday is a publication of The Triological Society.

Polls

Would you choose a concierge physician as your PCP?

View Results

Loading ... Loading ...
  • Polls Archive

Top Articles for Residents

  • Applications Open for Resident Members of ENTtoday Edit Board
  • How To Provide Helpful Feedback To Residents
  • Call for Resident Bowl Questions
  • New Standardized Otolaryngology Curriculum Launching July 1 Should Be Valuable Resource For Physicians Around The World
  • Do Training Programs Give Otolaryngology Residents the Necessary Tools to Do Productive Research?
  • Popular this Week
  • Most Popular
  • Most Recent
    • A Journey Through Pay Inequity: A Physician’s Firsthand Account

    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • Is Middle Ear Pressure Affected by Continuous Positive Airway Pressure Use?

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • Is Middle Ear Pressure Affected by Continuous Positive Airway Pressure Use?

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • Complications for When Physicians Change a Maiden Name

    • Excitement Around Gene Therapy for Hearing Restoration
    • “Small” Acts of Kindness
    • How To: Endoscopic Total Maxillectomy Without Facial Skin Incision
    • Science Communities Must Speak Out When Policies Threaten Health and Safety
    • Observation Most Cost-Effective in Addressing AECRS in Absence of Bacterial Infection

Follow Us

  • Contact Us
  • About Us
  • Advertise
  • The Triological Society
  • The Laryngoscope
  • Laryngoscope Investigative Otolaryngology
  • Privacy Policy
  • Terms of Use
  • Cookies

Wiley

Copyright © 2025 by John Wiley & Sons, Inc. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies. ISSN 1559-4939