AbstractPurposeDelineating the swallowing and chewing structures in Head and Neck (H&N) CT scans is necessary for radiotherapy treatment (RT) planning to reduce the incidence of radiation-induced dysphagia, trismus, and speech dysfunction. Automating this process would decrease the manual input required and yield reproducible segmentations, but generating accurate segmentations is challenging due to the complex morphology of swallowing and chewing structures and limited soft tissue contrast in CT images.MethodsWe trained deep learning models using 194 H&N CT scans from our institution to segment the masseters (left and right), medial pterygoids (left and right), larynx, and pharyngeal constrictor muscle using DeepLabV3+ with the resnet-101 backbone. Models were trained in a sequential manner to guide the localization of each structure group based on prior segmentations. Additionally, an ensemble of models was developed using contextual information from three different views (axial, coronal, and sagittal), for robustness to occasional failures of the individual models. Output probability maps were averaged, and voxels were assigned labels corresponding to the class with the highest combined probability.ResultsThe median dice similarity coefficients (DSC) computed on a hold-out set of 24 CT scans were 0.87±0.02 for the masseters, 0.80±0.03 for the medial pterygoids, 0.81±0.04 for the larynx, and 0.69±0.07for the constrictor muscle. The corresponding 95th percentile Hausdorff distances were 0.32±0.08cm (masseters), 0.42±0.2cm (medial pterygoids), 0.53±0.3cm (larynx), and 0.36±0.15cm (constrictor muscle). Dose-volume histogram (DVH) metrics previously found to correlate with each toxicity were extracted from manual and auto-generated contours and compared between the two sets of contours to assess clinical utility. Differences in DVH metrics were not found to be statistically significant (p>0.05) for any of the structures. Further, inter-observer variability in contouring was studied in 10 CT scans. Automated segmentations were found to agree better with each of the observers as compared to inter-observer agreement, measured in terms of DSC.ConclusionsWe developed deep learning-based auto-segmentation models for swallowing and chewing structures in CT. The resulting segmentations can be included in treatment planning to limit complications following RT for H&N cancer. The segmentation models developed in this work are distributed for research use through the open-source platform CERR, accessible at https://github.com/cerr/CERR.