Page 104 - Strategies for non-invasive managementof high-grade cervical intraepithelial neoplasia - prognostic biomarkers and immunotherapy Margot Maria Koeneman
P. 104

Chapter 5
multiple imputation strategy in order to prevent a potentially considerable loss of precision and minimize the probability of introducing bias in the estimation of coefficients [22]. The number of imputations was set to five. A sensitivity analysis was performed by comparing the imputed dataset to the original dataset, in order to determine whether imputation led to significantly different results.
Model development
For each of the five imputed datasets, all potential predictors were analyzed in a multivariable logistic regression model, using disease regression within 24 months as the outcome variable. Backward stepwise deletion based on the Wald test was applied to reduce the number of predictors in the model, using a p-value of 0.20, as recommended by prediction modeling guidelines [12]. Predictors that remained in at least three out of five imputed datasets were included in the final model. The selected predictors were re-estimated in all imputed datasets separately. The results of the five models were combined into one prediction model.
Internal validation of the model
Internal validation was performed using the bootstrapping method. This is a method to correct for over-fitting, when the model performs well for the data from which it was developed but provides too extreme predictions for future patients. Backward stepwise deletion can introduce a selection bias, as predictors that are overestimated by chance are more likely to be included than predictors that are underestimated by chance. This may lead to over-fitting. B-bootstrap samples of the same size as the original sample were drawn while replacing the original data, which reflects the drawing of samples from the underlying population. This was repeated 1000 times (B). In this way, the likely performance in future patients could be estimated, and the model was adjusted to make future predictions less extreme.
Performance of the model
The overall performance of the internally validated model was assessed using Nagelkerke’s R2 and the Brier score. Nagelkerke’s R2 is a pseudo R2 compared to the one for linear regression, and can be used to quantify the predictive strength of a model. The higher the Nagelkerke’s R2, the greater the strength of the model. The Brier score quantifies the average prediction error as the difference between binary outcomes and predictions, and thus should be close to zero. The ability of the model to identify patients in whom a CIN 2 lesion will regress spontaneously was quantified as the area under the receiver operating characteristic curve (AUC). The AUC ranges from 50% to 100%, indicating no discriminative capacity to perfect discriminative capacity. The agreement between predicted probabilities and observed frequencies of the outcome was assessed by visually inspecting the calibration plot. Last, the Hosmer and Lemeshow (H-L) goodness-of-fit statistic was computed as a quantitative measure of accuracy. A high outcome of this statistic is related to a low p-value, which indicates a poor fit. All statistical analyses were performed using SPSS 22.0 (IBM; Armonk, NY, USA; released 2013) and R 3.2.1 (http://www.r-project.org).
           102


























































































   102   103   104   105   106