18F-FDG PET as biomarker in aggressive lymphoma; technical and clinical validation

Page 110 - 18F-FDG PET as biomarker in aggressive lymphoma; technical and clinical validation

P. 110

Chapter 5
interobserver variability for MTV assessment in Workflow A. TLG showed similar ICCs and CoVs for these two methods.
Ease of Use
Mean analysis time in Workflow A was 28.7 min per patient (range 5–63,Table 2). The most preferred method differed per patient and between observers (Table 3). A50%P and SUV≥4.0 were most often chosen as “preferred segmentation” on a patient-level with success rates (rated as acceptable or good segmentations of visible tumor) ranging from 33 to 87 % and 35–76 %, respectively. The mean success rate for the 41%MAX method ranged from 31 to 86 % between observers. The success rates for the MV2 and MV3 methods, as scored by one observer, were 84 % and 87 %, respectively. Although SUV≥2.5 showed the highest observer reliability, this method was chosen only in 2 patients as the most preferred method by 1 observer. The mean success rate for this method ranged between 27 and 39 % between observers. This method tended to overestimate the tumor volumes (Supplemental Figs. 1–2). Therefore, we decided to focus on the SUV≥4.0 method as preselection criterion.
Workflow B; Preselection Strategy
Lesion Selection
The total number of selected tumor regions for observer 1, 2, and 3 was 76, 76, and 77, respectively. Seventy-two identical tumor regions were selected by all three observers.
Interobserver Reliability
Workflow B is based on the SUV≥4.0 threshold and showed good correlation with SUV≥4.0 threshold of Workflow A with a Pearson correlation of 0.812 (after removing 4 volumes as outliers in 2 patients 0.995, Fig. 1). Outliers were caused by one patient with many lesions, in whom the SUV≥4.0 threshold failed (large parts of the liver and spleen were included in this segmentation) and another with a large abdominal lesion that was interpreted as non-lymphoma by one observer. Complete agreement of the preselected volumes on a patient-level between all observers was found in six patients. The ICC value for generated MTVs in this workflow was excellent (1.00, 95%CI 1.00–1.00) and the mean CoV was 2.3 % (range 0–10.4 %, Table 1), with similar results for TLG.
108

108 109 110 111 112