Comprehensive Quantitative Evaluation of Inter-observer Delineation Performance of MR-guided Delineation of Oropharyngeal Gross Tumor Volumes and High-risk Clinical Target Volumes: An R-IDEAL Stage 0 Prospective Study

  • Purpose: Tumor and target volume manual delineation remains a challenging task in head and neck cancer radiotherapy. The purpose of this study is to conduct a multi-institutional evaluation of manual delineations of gross tumor volume (GTV), high-risk clinical target volume (CTV), parotids, and submandibular glands on treatment simulation MR scans of oropharyngeal cancer (OPC) patients. Methods: Pre-treatment T1-weighted (T1w), T1-weighted with Gadolinium contrast (T1w+C) and T2-weighted (T2w) MRI scans were retrospectively collected for 4 OPC patients under an IRB-approved protocol. The scans were provided to twenty-six radiation oncologists from seven international cancer centers who participated in this delineation study. In addition, clinical history and physical examination findings along with a medical photographic image and radiological results were provided. The contours were compared using overlap and distance metrics using both STAPLE and pair-wise comparisons. Lastly, participants completed a brief questionnaire to assess personal experience and CTV delineation institutional practices. Results: Large variability was measured between observer delineations for both GTVs and CTVs. The mean Dice Similarity Coefficient values across all case delineations for GTVp, GTVn, CTVp, and CTVn where 0.77, 0.67, 0.77, and 0.69, respectively, for STAPLE comparison and 0.67, 0.60, 0.67, and 0.58, respectively, for pair-wise analysis. Normal tissue contours were defined more consistently when considering overlap and distance metrics. The median radiation oncology clinical experience was 7 years and the median experience delineating on MRI was 3.5 years. The GTV-to-CTV margin used was 10 mm for six of seven participant institutions. One institution used 8 mm and three delineators (from three different institutions) used a margin of 5 mm. Conclusion: The data from this study suggests that appropriate guidelines, contouring quality assurance sessions, and training are still needed for the adoption of MR-based treatment planning for head and neck cancers. Such efforts should play a critical role in reducing inter-observer delineation variation and ensure standardization of target design across clinical practices.
