Validation of models to diagnose ovarian cancer in patients managed surgically or conservatively: multicentre cohort study ========================================================================================================================== * Ben Van Calster * Lil Valentin * Wouter Froyman * Chiara Landolfo * Jolien Ceusters * Antonia C Testa * Laure Wynants * Povilas Sladkevicius * Caroline Van Holsbeke * Ekaterini Domali * Robert Fruscio * Elisabeth Epstein * Dorella Franchi * Marek J Kudla * Valentina Chiappa * Juan L Alcazar * Francesco P G Leone * Francesca Buonomo * Maria Elisabetta Coccia * Stefano Guerriero * Nandita Deo * Ligita Jokubkiene * Luca Savelli * Daniela Fischerová * Artur Czekierdowski * Jeroen Kaijser * An Coosemans * Giovanni Scambia * Ignace Vergote * Tom Bourne * Dirk Timmerman ## Abstract **Objective** To evaluate the performance of diagnostic prediction models for ovarian malignancy in all patients with an ovarian mass managed surgically or conservatively. **Design** Multicentre cohort study. **Setting** 36 oncology referral centres (tertiary centres with a specific gynaecological oncology unit) or other types of centre. **Participants** Consecutive adult patients presenting with an adnexal mass between January 2012 and March 2015 and managed by surgery or follow-up. **Main outcome measures** Overall and centre specific discrimination, calibration, and clinical utility of six prediction models for ovarian malignancy (risk of malignancy index (RMI), logistic regression model 2 (LR2), simple rules, simple rules risk model (SRRisk), assessment of different neoplasias in the adnexa (ADNEX) with or without CA125). ADNEX allows the risk of malignancy to be subdivided into risks of a borderline, stage I primary, stage II-IV primary, or secondary metastatic malignancy. The outcome was based on histology if patients underwent surgery, or on results of clinical and ultrasound follow-up at 12 (±2) months. Multiple imputation was used when outcome based on follow-up was uncertain. **Results** The primary analysis included 17 centres that met strict quality criteria for surgical and follow-up data (5717 of all 8519 patients). 812 patients (14%) had a mass that was already in follow-up at study recruitment, therefore 4905 patients were included in the statistical analysis. The outcome was benign in 3441 (70%) patients and malignant in 978 (20%). Uncertain outcomes (486, 10%) were most often explained by limited follow-up information. The overall area under the receiver operating characteristic curve was highest for ADNEX with CA125 (0.94, 95% confidence interval 0.92 to 0.96), ADNEX without CA125 (0.94, 0.91 to 0.95) and SRRisk (0.94, 0.91 to 0.95), and lowest for RMI (0.89, 0.85 to 0.92). Calibration varied among centres for all models, however the ADNEX models and SRRisk were the best calibrated. Calibration of the estimated risks for the tumour subtypes was good for ADNEX irrespective of whether or not CA125 was included as a predictor. Overall clinical utility (net benefit) was highest for the ADNEX models and SRRisk, and lowest for RMI. For patients who received at least one follow-up scan (n=1958), overall area under the receiver operating characteristic curve ranged from 0.76 (95% confidence interval 0.66 to 0.84) for RMI to 0.89 (0.81 to 0.94) for ADNEX with CA125. **Conclusions** Our study found the ADNEX models and SRRisk are the best models to distinguish between benign and malignant masses in all patients presenting with an adnexal mass, including those managed conservatively. **Trial registration** ClinicalTrials.gov [NCT01698632](https://www.bmj.com/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT01698632&atom=%2Fbmj%2F370%2Fbmj.m2614.atom). ## Introduction Ovarian cancer is a gynaecological malignancy with a high mortality rate. In 2018, an estimated 295 400 women developed ovarian cancer worldwide, and 184 800 deaths were reported from the disease.1 The prognosis for women with ovarian cancer treated in oncology centres is better than for those managed in other settings.2345 Methods such as risk prediction models are needed to reliably estimate the likelihood that a mass is malignant so that patients can receive the optimal treatment. Risk prediction models can be used to individualise patient management, such as setting priorities on waiting lists for further investigations and specialist consultations, and deciding whether patients need surgery performed by surgeons who specialise in oncological surgery or whether surgery is not required. Adnexal masses judged to be benign can be safely managed with follow-up.6 If a benign mass causes symptoms it can be surgically removed in a local centre, however if malignancy is suspected the mass should be managed in an oncological referral centre. Ultrasound based diagnostic models can be used to predict malignancy in adnexal masses. A commonly used model is the risk of malignancy index (RMI), which was developed in 1990.7 Newer models are the International Ovarian Tumour Analysis (IOTA) models: logistic regression model 1 (LR1), logistic regression model 2 (LR2), simple rules, simple rules risk model (SRRisk), and assessment of different neoplasias in the adnexa (ADNEX).891011 The performance of the IOTA models has been externally validated and compared on thousands of patients.121314 However, only three validation studies reported on model calibration, that is, the agreement between predicted malignancy risk and observed proportion of malignancy; only one compared the clinical utility of different prediction models for referring patients with an adnexal mass to an oncology centre.15161718 More importantly, all model development and validation studies, except for two small single centre validation studies,1920 recruited patients who underwent surgery with histology as the reference standard. Therefore, current evidence is limited to patients for whom the decision to operate had already been made. To select the optimal treatment, models should perform well on patients who undergo surgery and on those who are managed conservatively. The primary aim of this study was to evaluate the performance of the IOTA models and RMI when applied on all adnexal masses irrespective of management, that is, conservative or surgical. The secondary aim was to evaluate model performance in clinically relevant subgroups. ## Methods ### Study design and participants This study was conducted by using interim data from the IOTA phase 5 study (IOTA5), an international multicentre prospective cohort study.6 IOTA5 recruited consecutive patients with an adnexal mass examined by using transvaginal ultrasound, irrespective of whether patients subsequently underwent surgery or were managed conservatively with follow-up visits. Appendix 1 presents the IOTA5 protocol. IOTA5 recruitment took place from January 2012 to October 2016, but follow-up will continue until all patients receiving conservative management have been followed up for at least five years. The current interim analysis includes patients recruited until 1 March 2015 and follow-up data until 30 June 2017. Thirty six centres in 14 countries recruited patients to the study. The contributing centres were either oncology referral centres (tertiary centres with a specific gynaecological oncology unit) or other types of centre. We obtained approval from the ethics committee of the University Hospitals Leuven as the coordinating centre (B32220095331/S51375) and the local ethics committee of each contributing centre. We report the study according to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) guidelines.21 Patients were eligible if they were aged 18 or older at recruitment and presented with at least one adnexal mass (ovarian, para-ovarian, or tubal) on ultrasound examination. Informed consent was obtained and then local clinicians examined patients following a standardised research protocol. Exclusion criteria were lesions presumed to be physiological if the largest diameter was less than 3 cm, refusal to provide informed consent, or withdrawal of informed consent. Pregnancy was not an exclusion criterion. We excluded patients if they had an adnexal mass that was already being followed up in the recruitment centre before the start of the study. ### Procedures The ultrasound examiners who recruited participants followed the standardised research protocol. They collected clinical information and performed a transvaginal ultrasound examination, and an abdominal scan if necessary. The examination consisted of scanning the uterus, both adnexa, and the whole pelvis outside these organs. Grey scale and colour or power Doppler ultrasound was used to characterise the morphology and vascularisation of the adnexal mass. Examiners collected information on several predefined ultrasound variables, including those used in the prediction models, and ultrasound results were described by using IOTA terminology.22 We had no requirements about the level of experience of the ultrasound examiners, but all examiners were IOTA trainers or had passed the IOTA certification test ([https://www.iotagroup.org/certified-members](https://www.iotagroup.org/certified-members)). Ultrasound examiners used subjective assessment of the ultrasound images to classify lesions as benign, borderline, or malignant, and specified the degree of certainty with which the classification was made (certain, probable, or uncertain). The presumed histology was registered according to a list of 18 predefined diagnoses. These diagnoses were based on knowledge of the typical ultrasound appearance of benign, borderline, and malignant lesions, and of different types of specific adnexal pathology.23 When examiners detected multiple masses, the dominant mass was defined as the mass with the most complex ultrasound morphology. If multiple masses had similar morphology, the largest mass or the mass that was most accessible with ultrasound was denoted dominant. We used the dominant mass in our statistical analyses. The ultrasound examiner suggested surgery or conservative management based on the ultrasound diagnosis and the patient’s symptoms. Ultimately, however, the treating clinician decided upon the management strategy together with the patient. Therefore, the suggested management and actual management might be different. We encouraged centres to measure the level of serum CA125 in all patients, but this was not a requirement for inclusion in the study. Measurement of CA125 was left to clinical judgment and local protocols. Conservative management included ultrasound and clinical follow-up at intervals of three months, six months, and then every 12 months thereafter. At follow-up visits clinical information including symptoms was collected and an ultrasound examination was performed in the same manner as at the inclusion scan. Examiners collected data on several predefined ultrasound variables and suggested a diagnosis by subjectively assessing the ultrasound images. After one or more follow-up visits, some patients underwent surgery for a variety of reasons (eg, suspicion of malignancy or patient anxiety).6 For some patients, the mass resolved spontaneously during follow-up. Each centre performed surgery by following local protocols and histological examination of surgically removed masses. We did not carry out a central pathology review because in a previous study we did not observe important differences in reported outcomes between local and central pathology reports.8 We classified malignant tumours according to the criteria recommended by the International Federation of Gynaecology and Obstetrics.24 ### Data cleaning We collected patient level data by using a secure electronic platform developed for the study (IOTA5 Study Screen; astraia software, Munich, Germany). Patients automatically received a unique identifier upon enrolment. We encrypted all data communication to ensure data security. A team of biostatisticians and ultrasound examiners performed data cleaning. Data cleaning included sending queries to participating centres to retrieve missing information or to correct inconsistencies. Local centres used a standardised questionnaire (appendix 2) to accrue missing information by telephoning patients and managing clinicians. For the primary analysis, we excluded centres that recruited fewer than 50 patients, those that recruited non-consecutively (focused only on patients who underwent surgery without follow-up, or only on patients managed conservatively), and those that provided poor follow-up information for more than 30% of patients (supplementary table 1, appendix 3).6 Poor follow-up information was defined as the absence of a study outcome (spontaneous resolution or histology based on surgery at any point during follow-up) and last follow-up visit less than ten months after inclusion. ### Prediction models We evaluated several ultrasound based prediction models: RMI, LR2, simple rules, SRRisk, ADNEX without CA125, and ADNEX with CA125 (table 1). Model predictions are based on information obtained at the inclusion scan and so are blinded to the outcome. RMI does not give an estimated risk, but a non-negative integer (0 or higher), with higher scores suggesting a higher likelihood of malignancy. LR2 and SRRisk calculate the risk that the tumour is malignant. ADNEX calculates the probability of five outcome categories: benign, borderline, stage I primary invasive ovarian malignancy, stage II-IV primary invasive ovarian malignancy, and metastasis in the adnexa from another primary tumour (eg, breast cancer). For ADNEX, one minus the probability of a benign tumour equals the estimated risk of malignancy. The simple rules classify tumours as benign, inconclusive, or malignant based on the presence of five typical ultrasound features of benign tumours and five typical ultrasound features of malignant tumours. The prediction is inconclusive when none of the 10 features is present, or when a mixture of benign and malignant features is present. Here, we add inconclusive tumours to those predicted to be malignant, resulting in a binary classifier. Appendix 4 gives details of predictors and model formulas. View this table: [Table 1](https://www.bmj.com/content/370/bmj.m2614/T1) Table 1 Summary of diagnostic prediction models for ovarian malignancy ### Outcomes The reference standard describes the nature of the adnexal mass. The primary outcome was classification of tumours as benign or malignant. This classification was based on histology when patients had surgery or subjective assessment at inclusion and during follow-up until 12 (±2) months when surgery was not performed. We considered the outcome as uncertain when not enough information was available to make a reasonable classification of the mass as benign or malignant at inclusion. Table 2 shows detailed classification criteria. Pathologists were blinded to ultrasound predictor variables and model predictions, but might have received information on the subjective assessment by the ultrasound examiner when clinically relevant. Borderline tumours were classified as malignant. View this table: [Table 2](https://www.bmj.com/content/370/bmj.m2614/T2) Table 2 Definition of tumour outcome based on histology or clinical information For a full evaluation of ADNEX, we used a multinomial reference standard describing the adnexal mass at inclusion as benign, borderline malignant, stage I primary ovarian malignancy, stage II-IV primary ovarian malignancy, or secondary metastatic malignancy (secondary outcome). ### Statistical analysis We followed a prespecified statistical analysis plan for this study. Appendix 5 presents the sample size determination for the IOTA5 study. We had missing values for CA125 and some outcomes were labelled uncertain and therefore missing. We used multiple imputation to address these missing values (appendix 5). The imputations were based on variables used as predictors in the models, and variables that are associated with CA125 or with outcome, or with their missingness. Our primary analysis included patients after multiple imputation of missing values. We evaluated discrimination between benign and malignant tumours with the area under the receiver operating characteristic curve (AUC) for the risk prediction models and RMI. To account for variability in performance between centres (heterogeneity), we used meta-analysis of centre specific AUCs to obtain the overall AUC for each model. Heterogeneity was quantified using 95% prediction intervals, which indicate which AUC values can be expected when evaluating the model in a new centre.25 We used the DeLong method to calculate 95% confidence intervals for the difference in AUC between two models.26 For ADNEX, we calculated the AUC for each pair of tumour types.27 We calculated sensitivity and specificity for prespecified thresholds for RMI, LR2, SRRisk, and ADNEX. For any threshold, patients with a result at or above the threshold were classified at high risk of malignancy. We compared RMI with other models by calculating sensitivity when fixing specificity at 90%, and specificity when fixing sensitivity at 90%. Overall sensitivity and specificity values were obtained by using a meta-analysis of centre specific results. We assessed calibration of LR2, SRRisk, and ADNEX by calculating calibration intercept and slope, and used these values to generate centre specific and overall calibration curves. The calibration intercept assesses whether risks are generally overestimated (intercept <0) or underestimated (intercept >0). The calibration slope assesses whether risks are too extreme (slope <1) or too moderate (slope >1).28 When too extreme, low estimated risks are underestimated and high risks are overestimated. When too moderate, low risks are overestimated and high risks are underestimated. For RMI, we performed an analogous analysis to estimate the prevalence of malignancy conditional on the RMI value, and constructed centre specific and overall curves. For ADNEX, we assessed calibration for all five predicted outcomes.29 We assessed clinical utility by using decision curve analysis for risk thresholds between 5% and 50% to decide which patients should be referred to specialised oncological care. We report overall decision curves based on a meta-analysis of centre specific curves.30 We obtained overall AUCs and calibration curves for several prespecified subgroups: actual management (surgery within 120 days without any follow-up scan *v* at least one follow-up scan), management suggested by ultrasound examiner (surgery *v* conservative management with follow-up visits), menopausal status, and type of centre. Overall AUCs and calibration curves were computed for several prespecified sensitivity analyses: an analysis that excludes masses with uncertain outcome (U1-U4 in table 2); an analysis in which the definition of an uncertain outcome is expanded to include groups B2 and M2-M3 in table 2 (all groups in which subjective assessment of ultrasound images was used to classify outcomes as benign or malignant); and an analysis from all 36 centres of patients who underwent surgery within 120 days without any follow-up scan (not restricted to centres with high quality follow-up data). Appendix 5 presents details of the statistical analysis. The analysis was performed by using R version 3.5.1. ### Patient and public involvement Patients were not involved in the study design, definition of outcome measures, recruiting plans of the study, or interpretation of study results. We discussed the study with KanActief, a cancer rehabilitation patient group at the University Hospitals Leuven ([https://www.uzleuven.be/nl/kanactief](https://www.uzleuven.be/nl/kanactief)). ## Results In total, 98 ultrasound examiners at 36 centres recruited 8519 patients into the interim dataset of IOTA5 (supplementary table 1). After we applied the exclusion criteria (appendix 3) and data cleaning, our primary analysis consisted of 4905 patients recruited by 58 ultrasound examiners at 17 centres (fig 1, table 3). ![Fig 1](https://www.bmj.com/https://www.bmj.com/content/bmj/370/bmj.m2614/F1.medium.jpg) [Fig 1](https://www.bmj.com/content/370/bmj.m2614/F1) Fig 1 Study flowchart. Criteria for excluding centres were fewer than 50 patients recruited, non-consecutive recruitment, or insufficient quality of follow-up data (appendix 3). Eleven of 20 oncology centres and 8 of 16 non-oncology centres were excluded. Supplementary table 1 gives details of excluded centres. IOTA5=International Ovarian Tumour Analysis phase 5 study **Table 3** Overview of 17 centres included in primary analysis. Data are number or number (row percentage) View this table: [Table3](https://www.bmj.com/content/370/bmj.m2614/T3) The median age of the 4905 patients was 48 years (interquartile range 36-62, range 18-98), and 2151 patients (44%) were postmenopausal (table 4). Information on CA125 was missing in 2620 of the 4905 (53%) patients: 835 of 2579 (32%) missing values when surgery was suggested and 1785 of 2326 (77%) missing values when conservative management was suggested. The outcome was benign for 3441 (70%) patients, malignant for 978 (20%), and uncertain for 486 (10%) patients. The tumours in the current cohort manifested more benign ultrasound features compared with the development datasets of the different models, which were limited to patients who underwent surgery (supplementary table 2). View this table: [Table 4](https://www.bmj.com/content/370/bmj.m2614/T4) Table 4 Descriptive statistics for patients in primary analysis (n=4905) The overall AUC was highest for ADNEX with CA125 (0.94, 95% confidence interval 0.92 to 0.96), ADNEX without CA125 (0.94, 0.91 to 0.95) and SRRisk (0.94, 0.91 to 0.95), and lowest for RMI (0.89, 0.85 to 0.92; fig 2). Differences in AUC between centres (heterogeneity) were largest for RMI, with a 95% prediction interval from 0.74 to 0.96 (supplementary figs 1-5). Supplementary table 3 provides 95% confidence intervals for the difference in AUC between models. At a risk threshold of 10%, ADNEX with CA125 had an overall sensitivity of 91% (95% confidence interval 85% to 95%) and specificity of 85% (81% to 89%). At a threshold of 200, RMI had an overall sensitivity of 60% (54% to 67%) and specificity of 95% (93% to 97%; supplementary tables 4-5). When overall specificity was fixed at 90%, SRRisk had the highest sensitivity (89%) and RMI the lowest (70%), while the sensitivity for ADNEX with CA125 was 87% (table 5). When overall sensitivity was fixed at 90%, ADNEX with CA125 had the highest specificity (87%) and RMI the lowest (69%; table 5). The simple rules model had an overall sensitivity of 90% (86% to 94%) and a specificity of 87% (83% to 91%). ![Fig 2](https://www.bmj.com/https://www.bmj.com/content/bmj/370/bmj.m2614/F2.medium.jpg) [Fig 2](https://www.bmj.com/content/370/bmj.m2614/F2) Fig 2 Summary forest plot with overall area under the receiver operating characteristic curve (AUC) for each model. ADNEX=assessment of different neoplasias in the adnexa; LR2=logistic regression model 2; PI=prediction interval; RMI=risk of malignancy index; SRRisk=simple rules risk model View this table: [Table 5](https://www.bmj.com/content/370/bmj.m2614/T5) Table 5 Sensitivity (at 90% specificity) and specificity (at 90% sensitivity) for all prediction models ADNEX with CA125 had an overall calibration intercept of 0.19 (95% confidence interval −0.01 to 0.40) and a slope of 1.11 (0.98 to 1.25). Risk estimates were slightly underestimated (fig 3). Calibration of SRRisk was marginally better and that of LR2 was poorer. We observed heterogeneity between centres for calibration for all models, with least heterogeneity for ADNEX with CA125 (supplementary figs 6-10). The overall calibration curve for RMI (supplementary fig 11) indicated that the commonly used threshold of 200 corresponded to a risk of malignancy of 45-50% on average. Supplementary figs 12-16 present histograms of the predictions of RMI and the risk prediction models. ![Fig 3](https://www.bmj.com/https://www.bmj.com/content/bmj/370/bmj.m2614/F3.medium.jpg) [Fig 3](https://www.bmj.com/content/370/bmj.m2614/F3) Fig 3 Summary figure with overall calibration curves for risk prediction models. ADNEX=assessment of different neoplasias in the adnexa; intercept=calibration intercept; LR2=logistic regression model 2; RMI=risk of malignancy index; slope=calibration slope; SRRisk=simple rules risk model SRRisk and ADNEX with CA125 had the best overall utility to select patients for referral to a gynaecological oncology centre (fig 4). RMI at a threshold of 200 had the lowest clinical utility of all the models. ![Fig 4](https://www.bmj.com/https://www.bmj.com/content/bmj/370/bmj.m2614/F4.medium.jpg) [Fig 4](https://www.bmj.com/content/370/bmj.m2614/F4) Fig 4 Overall decision curves for risk prediction models and RMI. Higher net benefit implies higher clinical utility (the higher the curve, the better the clinical utility at the chosen risk threshold).1830 ADNEX=assessment of different neoplasias in the adnexa; LR2=logistic regression model 2; RMI=risk of malignancy index; SRRisk=simple rules risk model For the ADNEX model with CA125, distinguishing between borderline and stage I primary ovarian malignancy (AUC 0.77), between stage I primary ovarian malignancy and secondary metastatic cancer (AUC 0.75), and between stage II-IV primary ovarian malignancy and secondary metastatic cancer (AUC 0.78) was the most difficult (supplementary table 6). AUCs ranged from 0.90 to 0.98 when distinguishing between benign tumours and malignant subtypes. ADNEX without CA125 mainly affected discrimination between stage II-IV and stage I primary ovarian malignancy (AUC was 0.81 when CA125 was included *v* 0.72 when CA125 was not included), and between stage II-IV primary ovarian malignancy and secondary metastatic malignancy (AUC 0.78 *v* 0.66). Calibration of the estimated risks for the five tumour subtypes was good for ADNEX irrespective of whether or not CA125 was included as a predictor (supplementary figs 17-18). In every subgroup, RMI had the lowest overall AUC and ADNEX with CA125 the highest overall AUC (fig 5, supplementary table 7). Among the 1958 patients with at least one follow-up scan, the overall AUC was 0.76 for RMI and ranged from 0.87 to 0.89 for the IOTA models. Because of the low malignancy rate (2%) in this subgroup, the confidence interval around the AUC was wide. Calibration analysis indicated that the risk of malignancy was overestimated in this subgroup (supplementary figs 19-34). The results obtained in the sensitivity analyses were similar to those in the primary analysis (supplementary figs 35-43). ![Fig 5](https://www.bmj.com/https://www.bmj.com/content/bmj/370/bmj.m2614/F5.medium.jpg) [Fig 5](https://www.bmj.com/content/370/bmj.m2614/F5) Fig 5 Summary forest plots of overall area under the receiver operating characteristic curve (AUC) for prespecified subgroups. Prediction intervals could not be calculated for two subgroups because the number of malignant outcomes for each centre was too small for meta-analysis to be possible. ADNEX=assessment of different neoplasias in the adnexa; LR2=logistic regression model 2; PI=prediction interval; RMI=risk of malignancy index; SRRisk=simple rules risk model ## Discussion ### Principal findings This study is a comprehensive evaluation of RMI and IOTA models when applied to all patients presenting with an adnexal mass, irrespective of whether they received surgical or conservative management. ADNEX with CA125, ADNEX without CA125, and SRRisk were the best models to distinguish between benign and malignant adnexal masses. These models were reasonably well calibrated overall, and had the highest clinical utility. RMI had the lowest AUC and clinical utility. Performance varied between centres for all models, but it varied most for RMI. In every prespecified subgroup, ADNEX with CA125 had the highest AUC and RMI the lowest AUC. All models, in particular RMI, had poorer discriminative ability in patients who were managed conservatively than in those who underwent surgery. This difference could be because masses managed conservatively are more homogenous than those removed surgically. Most masses managed conservatively probably manifested clearly benign ultrasound signs, few were malignant, and most malignancies managed conservatively were borderline tumours, with ultrasound features that could be confused with those of benign tumours.3132 All models overestimated the risk of malignancy in patients who were managed conservatively, which was expected because none of the models was developed for this population. ### Strengths and limitations of the study Our study has several strengths. We included patients irrespective of whether they were managed surgically or conservatively, and the large sample size allowed us to evaluate models in clinically relevant subgroups. Additionally we recruited from many centres in different countries, used a large number of ultrasound examiners, and implemented a rigorous prospective protocol with agreed ultrasound terms, definitions, and measurement technique.22 Finally, we evaluated the calibration and clinical utility of all the models. The first limitation is that several centres had to be excluded because of non-consecutive recruitment or insufficient quality of follow-up data. However, a similar proportion of oncology centres (11/20) and non-oncology centres (8/16) were excluded. The second, inevitable limitation is that our reference standard is based on two different methods: histology or results of clinical and ultrasound follow-up.33 Follow-up information is probably less accurate than histology when assigning the outcome; for some patients the outcome was partly based on subjective assessment at inclusion. We limited the risk of bias in our primary analysis by using multiple imputation to assign an outcome when clinical and ultrasound information was insufficient or inconsistent (n=486, 10% of all cases). Excluding uncertain outcomes might have induced bias because the assumption could be made that patients who do not undergo surgery without delay are more likely to have a benign tumour. Our two sensitivity analyses, one excluding all uncertain cases (U1-U4 in table 2) and one using imputation to assign an outcome when a broader definition of uncertain outcome was applied (B2, M2-M3, and U1-U4 in table 2; n=1419, 31% of all cases), showed similar results to those in our primary analysis. The third limitation is that CA125 values were missing in a substantial number of patients. We addressed the missing values using multiple imputation.34 Imputing missing values multiple times acknowledges that we are uncertain about the true value. The observation that the ranking of models in terms of AUC was the same in women who underwent surgery (low proportion of missing CA125 values) and in women who received conservative management (high proportion of missing CA125 values) suggests that our results are robust. ### Comparison with other studies Two small single centre studies have evaluated IOTA models on all patients irrespective of management. Nunes and colleagues (n=489) evaluated sensitivity and specificity of LR2 using a one risk cut-off point.19 Pereira and colleagues (n=170) evaluated the clinical utility of simple rules and SRRisk.20 Other published validation studies were limited to patients who underwent surgery. These studies showed that IOTA models distinguished better between benign and malignant adnexal masses than RMI, and that ADNEX might be the best performing model.1213141635363738 The studies also showed that SRRisk and ADNEX had good overall calibration (the authors did not report centre specific results) and better clinical utility than other models, including RMI, to refer patients to an oncology centre.10111718 To distinguish between benign and malignant masses in patients who underwent surgery, the following AUCs have been reported for ADNEX with CA125: 0.94 in the original study, between 0.91 and 0.97 in external validation studies, and 0.92 in the subgroup that underwent surgery in the current study.1017373839404142 ### Implications for practice In our study, the ADNEX models and SRRisk were the best performing models and were similar in terms of discrimination, calibration, and clinical utility. However, we would recommend ADNEX rather than SRRisk for several reasons. Firstly, ADNEX uses only simple and robust ultrasound variables, and so less experience should be needed for correct use of this model compared with SRRisk. Secondly, ADNEX estimates the likelihood of five different tumour types. This information could help when deciding which investigations to perform (investigations would differ if a metastasis is suspected rather than primary ovarian cancer), which surgical strategy to choose (fertility sparing surgery could be considered if a borderline tumour is suspected), and how long the waiting time should be for the operation. The likelihood of different tumour types can also help when deciding on the appropriate skills of the surgeon and estimating the duration of surgery. Other models do not provide this information. Because CA125 results are often not available when the patient is examined with ultrasound, ADNEX without CA125 can be used as a first step to distinguish between benign and malignant tumours during scanning. If ADNEX without CA125 yields a high risk of malignancy, blood sampling with analysis of CA125 can be arranged so that the most likely type of malignancy can be estimated by using ADNEX with CA125. Including CA125 in ADNEX improves discrimination between the malignant tumour types, but has little effect on the discrimination between benign and malignant masses. For application in clinical practice, ADNEX is available as an app for iOS or Android and as a web calculator ([https://iotagroup.org/iota-models-software/adnex-risk-model](https://iotagroup.org/iota-models-software/adnex-risk-model)). Some ultrasound machines have built-in functionalities that allow ADNEX to be used while the patient is being scanned. A prerequisite for safe use of the IOTA models is basic ultrasound skills. Additionally, ultrasound examiners must have obtained the IOTA certificate ([https://www.iotagroup.org/certified-members](https://www.iotagroup.org/certified-members)) to show that they are able to correctly use the IOTA terminology and measurement technique ([http://www.iota.education](http://www.iota.education/)).2243 For safe use of ADNEX, examiners need to use the IOTA definitions for solid component, papillary projection, acoustic shadowing, and ascites. Because of the consecutive recruitment of patients into our study, the substantial sample size, and the large number of ultrasound examiners and participating centres in different countries, our results could be generalisable to other patient populations. However, for all models heterogeneity existed between centres in terms of performance, in particular calibration. The heterogeneity can probably be explained by differences in tumour characteristics that are not captured by the predictors, and in tumour mix (that is, the relative proportion of benign, borderline, primary invasive malignancies, and metastases). Further research should focus on explaining and reducing this heterogeneity. The performance of the ADNEX model could be improved by updating the model by using data from patients who are conservatively managed and those who have had surgery. ### Conclusions and policy implications Our study has shown that ADNEX with or without CA125 and SRRisk are the best models for distinguishing between benign and malignant tumours in patients presenting with an adnexal mass. Because ADNEX is preferable to SRRisk for practical reasons, the model should be recommended for characterising ovarian tumours. The next step is to achieve consensus on the risk thresholds to be used when deciding whether patients with an adnexal tumour should receive conservative management, surgery in a local centre, or be referred to a gynaecological oncology centre for further evaluation. #### What is already known on this topic * Methods are needed to predict malignancy in ovarian tumours so that optimal management can be selected, such as watch and wait, surgery in a local hospital, or treatment in a gynaecological oncology centre * Existing prediction models are based only on data from patients who have undergone surgery * Model validation has mostly used data from patients who have had surgery and has rarely included assessment of calibration or clinical utility #### What this study adds * Assessment of different neoplasias in the adnexa (ADNEX) with or without CA125 and simple rules risk model (SRRisk) are the best models for distinguishing between benign and malignant adnexal tumours * ADNEX with CA125 performed best when distinguishing between benign and malignant tumours in all subgroups (including patients managed conservatively) * ADNEX has practical advantages over SRRisk, including the ability to estimate risk of malignant subtypes ## Acknowledgments We thank everyone who provided assistance for monitoring the study and completing the database. In particular, we thank Bavo De Cock, Willem Mestdagh, Mona Aboulghar, Ulrike Metzger, Anna Knafel, Chiara Lanzani, Fatima Alves, Thierry Van den Bosch, Samir Helmy, Alberto Rossi, and Maria Angela Pascual. We also thank librarian Maria Björklund, Lund University, for helping with literature search. ## Footnotes * Contributors: BVC, LV, ACT, TB, and DT conceived and designed the study. LV, WF, CL, ACT, PS, CVH, ED, RF, EE, DF, MJK, VC, JLA, FPGL, FB, MEC, SG, ND, LJ, LS, DF, ACz, JK, ACo, GS, IV, TB, and DT enrolled patients and acquired data. BVC, WF, CL, JC, and DT were involved in data cleaning. BVC, LV, WF, CL, JC, LW, and DT wrote the Statistical Analysis Plan. BVC and JC analysed the data, with support from LW. BVC, LV, WF, CL, JC, LW, TB, and DT were involved in data interpretation. BVC, LV, WF, CL, JC, and DT wrote the first draft of the manuscript. All authors critically reviewed and revised the draft. All authors approved the final version of the manuscript for submission. BVC and LV contributed equally. BVC, LV, and DT are the guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. * Funding: The IOTA5 study is supported by the Research Foundation – Flanders (FWO) projects G049312N/G0B4716N/12F3114N, and Internal Funds KU Leuven (project C24/15/037). DT is a senior clinical investigator of FWO, LW is a postdoctoral fellow of FWO, and WF was a clinical fellow of FWO. CL was supported by Linbury Trust Grant LIN 260. TB is supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Imperial College Healthcare National Health Service (NHS) Trust and Imperial College London. The views expressed are those of the authors and not necessarily those of the NHS, NIHR, or Department of Health. LV is supported by the Swedish Research Council (grant K2014-99X-22475-01-3, Dnr 2013-02282), funds administered by Malmö University Hospital and Skåne University Hospital, Allmänna Sjukhusets i Malmö Stiftelse för bekämpande av cancer (the Malmö General Hospital Foundation for fighting against cancer), and two Swedish governmental grants (Avtal om läkarutbildning och forskning (ALF)-medel and Landstingsfinansierad Regional Forskning). The funders played no role in study design, data collection, data analysis, data interpretation, or reporting. The guarantors had full access to all the data in the study, take responsibility for the integrity of the data and the accuracy of the data analysis, and had final responsibility for the decision to submit for publication. * Competing interests: All authors have completed the ICMJE uniform disclosure form at [www.icmje.org/coi\_disclosure.pdf](http://www.icmje.org/coi_disclosure.pdf) and declare: grants from Research Foundation – Flanders (FWO), Internal Funds KU Leuven, Linbury Trust, NIHR Biomedical Research Centre, and Swedish Research Council for the submitted work; TB reports grants, personal fees, and travel support from Samsung Medison, travel support from Roche Diagnostics, and personal fees from GE Healthcare, all outside the submitted work; IV reports grants, personal fees and non-financial support from Roche NV, outside the submitted work; BVC and DT report consultancy work done by KU Leuven to help implementing and testing the ADNEX model in ultrasound machines by Samsung Medison and GE Healthcare, outside the submitted work; no other relationships or activities that could appear to have influenced the submitted work; no royalties or patents related to any of these models (neither for the authors nor for their institutions). * Ethical approval: Approval was obtained from the ethics committee of the University Hospitals Leuven (Ethische Commissie Onderzoek UZ/KU Leuven, [https://www.uzleuven.be/nl/ethischecommissie/onderzoek](https://www.uzleuven.be/nl/ethischecommissie/onderzoek)) as the coordinating centre (B32220095331/S51375) and the local ethics committee of each contributing centre. * Data sharing: Follow-up data collection for the IOTA5 study is still ongoing. When the study has closed, and results have been published, the corresponding author may share data upon reasonable request (dirk.timmerman@uzleuven.be). R code used for the statistical analysis is available on GitHub ([https://github.com/benvancalster/IOTA5modelvalidation2020](https://github.com/benvancalster/IOTA5modelvalidation2020)). * The lead authors affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained. * Dissemination to participants and related patient and public communities: We plan to disseminate and discuss the results with patient groups and societies for gynaecology and gynaecological oncology, and through both media organisations and social media. This will be highly valuable when it comes to implementation of models into clinical practice. [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/). ## References 1. Ferlay J, Colombet M, Soerjomataram I, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer2019;144:1941-53. doi:10.1002/ijc.31937 pmid:30350310 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/ijc.31937&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=30350310&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 2. Earle CC, Schrag D, Neville BA, et al. Effect of surgeon specialty on processes of care and outcomes for ovarian cancer patients. J Natl Cancer Inst2006;98:172-80. doi:10.1093/jnci/djj019 pmid:16449677 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1093/jnci/djj019&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=16449677&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000235218700009&link_type=ISI) 3. Engelen MJA, Kos HE, Willemse PHB, et al. Surgery by consultant gynecologic oncologists improves survival in patients with ovarian carcinoma. Cancer2006;106:589-98. doi:10.1002/cncr.21616 pmid:16369985 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/cncr.21616&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=16369985&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000234822700013&link_type=ISI) 4. Woo YL, Kyrgiou M, Bryant A, Everett T, Dickinson HO. Centralisation of services for gynaecological cancers - a Cochrane systematic review. Gynecol Oncol2012;126:286-90. doi:10.1016/j.ygyno.2012.04.012 pmid:22507534 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.ygyno.2012.04.012&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=22507534&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 5. Bristow RE, Chang J, Ziogas A, Anton-Culver H. Adherence to treatment guidelines for ovarian cancer as a measure of quality care. Obstet Gynecol2013;121:1226-34. doi:10.1097/AOG.0b013e3182922a17 pmid:23812456 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1097/AOG.0b013e3182922a17&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=23812456&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000319436100013&link_type=ISI) 6. Froyman W, Landolfo C, De Cock B, et al. Risk of complications in patients with conservatively managed ovarian tumours (IOTA5): a 2-year interim analysis of a multicentre, prospective, cohort study. Lancet Oncol2019;20:448-58. doi:10.1016/S1470-2045(18)30837-4 pmid:30737137 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S1470-2045(18)30837-4&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=30737137&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 7. Jacobs I, Oram D, Fairbanks J, Turner J, Frost C, Grudzinskas JG. A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer. Br J Obstet Gynaecol1990;97:922-9. doi:10.1111/j.1471-0528.1990.tb02448.x pmid:2223684 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1111/j.1471-0528.1990.tb02448.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=2223684&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=A1990ED94000010&link_type=ISI) 8. Timmerman D, Testa AC, Bourne T, et al., International Ovarian Tumor Analysis Group. Logistic regression model to distinguish between the benign and malignant adnexal mass before surgery: a multicenter study by the International Ovarian Tumor Analysis Group. J Clin Oncol2005;23:8794-801. doi:10.1200/JCO.2005.01.7632 pmid:16314639 [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjEwOiIyMy8zNC84Nzk0IjtzOjQ6ImF0b20iO3M6MjM6Ii9ibWovMzcwL2Jtai5tMjYxNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 9. Timmerman D, Testa AC, Bourne T, et al. Simple ultrasound-based rules for the diagnosis of ovarian cancer. Ultrasound Obstet Gynecol2008;31:681-90. doi:10.1002/uog.5365 pmid:18504770 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.5365&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=18504770&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000257502000013&link_type=ISI) 10. Van Calster B, Van Hoorde K, Valentin L, et al., International Ovarian Tumour Analysis Group. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ2014;349:g5920. doi:10.1136/bmj.g5920 pmid:25320247 [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDkvb2N0MDdfMy9nNTkyMCI7czo0OiJhdG9tIjtzOjIzOiIvYm1qLzM3MC9ibWoubTI2MTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. Timmerman D, Van Calster B, Testa A, et al. Predicting the risk of malignancy in adnexal masses based on the Simple Rules from the International Ovarian Tumor Analysis group. Am J Obstet Gynecol2016;214:424-37. doi:10.1016/j.ajog.2016.01.007 pmid:26800772 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.ajog.2016.01.007&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=26800772&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 12. Kaijser J, Sayasneh A, Van Hoorde K, et al. Presurgical diagnosis of adnexal tumours using mathematical models and scoring systems: a systematic review and meta-analysis. Hum Reprod Update2014;20:449-62. doi:10.1093/humupd/dmt059 pmid:24327552 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1093/humupd/dmt059&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=24327552&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000334675600010&link_type=ISI) 13. Meys EMJ, Kaijser J, Kruitwagen RFPM, et al. Subjective assessment versus ultrasound models to diagnose ovarian cancer: A systematic review and meta-analysis. Eur J Cancer2016;58:17-29. doi:10.1016/j.ejca.2016.01.007 pmid:26922169 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.ejca.2016.01.007&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=26922169&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 14. Westwood M, Ramaekers B, Lang S, et al. Risk scores to guide referral decisions for people with suspected ovarian cancer in secondary care: a systematic review and cost-effectiveness analysis. Health Technol Assess2018;22:1-264. doi:10.3310/hta22440 pmid:30165935 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.3310/hta22440&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=29298736&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 15. Timmerman D, Van Calster B, Testa AC, et al. Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression models: a temporal and external validation study by the IOTA group. Ultrasound Obstet Gynecol2010;36:226-34. doi:10.1002/uog.7636 pmid:20455203 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.7636&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=20455203&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000281086100017&link_type=ISI) 16. Testa A, Kaijser J, Wynants L, et al. Strategies to diagnose ovarian cancer: new evidence from phase 3 of the multicentre international IOTA study. Br J Cancer2014;111:680-8. doi:10.1038/bjc.2014.333 pmid:24937676 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1038/bjc.2014.333&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=24937676&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 17. Sayasneh A, Ferrara L, De Cock B, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model: a multicentre external validation study. Br J Cancer2016;115:542-8. doi:10.1038/bjc.2016.227 pmid:27482647 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1038/bjc.2016.227&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=27482647&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 18. Wynants L, Timmerman D, Verbakel JY, et al. Clinical utility of risk models to refer patients with adnexal masses to specialized oncology care: multicenter external validation using decision curve analysis. Clin Cancer Res2017;23:5082-90. doi:10.1158/1078-0432.CCR-16-3248 pmid:28512173 [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNsaW5jYW5yZXMiO3M6NToicmVzaWQiO3M6MTA6IjIzLzE3LzUwODIiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNzAvYm1qLm0yNjE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 19. Nunes N, Ambler G, Foo X, Widschwendter M, Jurkovic D. Prospective evaluation of IOTA logistic regression models LR1 and LR2 in comparison with subjective pattern recognition for diagnosis of ovarian cancer in an outpatient setting. Ultrasound Obstet Gynecol2018;51:829-35. doi:10.1002/uog.18918 pmid:28976616 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.18918&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=28976616&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 20. Pereira PN, Sarian LO, Yoshida A, et al. Improving the performance of IOTA simple rules: sonographic assessment of adnexal masses with resource-effective use of a magnetic resonance scoring (ADNEX MR scoring system). Abdom Radiol (NY)2019. doi:10.1007/s00261-019-02207-9. pmid:31482379 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1007/s00261-019-02207-9&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=31482379&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 21. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ2015;350:g7594. doi:10.1136/bmj.g7594 pmid:25569120 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1136/bmj.g7594&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=25569120&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 22. Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I, International Ovarian Tumor Analysis (IOTA) Group. Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group. Ultrasound Obstet Gynecol2000;16:500-5. doi:10.1046/j.1469-0705.2000.00287.x pmid:11169340 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1046/j.1469-0705.2000.00287.x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=11169340&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000166913100022&link_type=ISI) 23. Valentin L. Use of morphology to characterize and manage common adnexal masses. Best Pract Res Clin Obstet Gynaecol2004;18:71-89. doi:10.1016/j.bpobgyn.2003.10.002 pmid:15123059 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.bpobgyn.2003.10.002&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=15123059&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 24. Heintz APM, Odicino F, Maisonneuve P, et al. Carcinoma of the ovary. Int J Gynaecol Obstet2006;95:S161-92doi:10.1016/S0020-7292(06)60033-7. [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/S0020-7292(06)60033-7&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=17161157&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 25. Snell KIE, Ensor J, Debray TPA, Moons KGM, Riley RD. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?Stat Methods Med Res2018;27:3505-22. doi:10.1177/0962280217705678 pmid:28480827 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1177/0962280217705678&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=28480827&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 26. Demler OV, Pencina MJ, D’Agostino RB Sr.. Misuse of DeLong test to compare AUCs for nested models. Stat Med2012;31:2577-87. doi:10.1002/sim.5328 pmid:22415937 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/sim.5328&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=22415937&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 27. Van Calster B, Vergouwe Y, Looman CW, Van Belle V, Timmerman D, Steyerberg EW. Assessing the discriminative ability of risk models for more than two outcome categories. Eur J Epidemiol2012;27:761-70. doi:10.1007/s10654-012-9733-3 pmid:23054032 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1007/s10654-012-9733-3&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=23054032&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 28. Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol2016;74:167-76. doi:10.1016/j.jclinepi.2015.12.005 pmid:26772608 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.jclinepi.2015.12.005&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=26772608&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 29. Van Hoorde K, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW, Van Calster B. Assessing calibration of multinomial risk prediction models. Stat Med2014;33:2585-96. doi:10.1002/sim.6114 pmid:24549725 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/sim.6114&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=24549725&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 30. Wynants L, Riley RD, Timmerman D, Van Calster B. Random-effects meta-analysis of the clinical utility of tests and prediction models. Stat Med2018;37:2034-52. doi:10.1002/sim.7653 pmid:29575170 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/sim.7653&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=29575170&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 31. Virgilio BA, De Blasis I, Sladkevicius P, et al. Imaging in gynecological disease (16): clinical and ultrasound characteristics of serous cystadenofibromas in adnexa. Ultrasound Obstet Gynecol2019;54:823-30. doi:10.1002/uog.20277 pmid:30937992 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.20277&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=30937992&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 32. Moro F, Zannoni GF, Arciuolo D, et al. Imaging in gynecological disease (11): clinical and ultrasound features of mucinous ovarian tumors. Ultrasound Obstet Gynecol2017;50:261-70. doi:10.1002/uog.17222 pmid:28782867 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.17222&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=28782867&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 33. de Groot JAH, Bossuyt PMM, Reitsma JB, et al. Verification problems in diagnostic accuracy studies: consequences and solutions. BMJ2011;343:d4770. doi:10.1136/bmj.d4770 pmid:21810869 [FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNDMvYXVnMDJfMy9kNDc3MCI7czo0OiJhdG9tIjtzOjIzOiIvYm1qLzM3MC9ibWoubTI2MTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 34. Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ2009;338:b2393. doi:10.1136/bmj.b2393 pmid:19564179 [FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzMzgvanVuMjlfMS9iMjM5MyI7czo0OiJhdG9tIjtzOjIzOiIvYm1qLzM3MC9ibWoubTI2MTQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 35. Van Holsbeke C, Van Calster B, Bourne T, et al. External validation of diagnostic models to estimate the risk of malignancy in adnexal masses. Clin Cancer Res2012;18:815-25. doi:10.1158/1078-0432.CCR-11-0879 pmid:22114135 [Abstract/FREE Full Text](https://www.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNsaW5jYW5yZXMiO3M6NToicmVzaWQiO3M6ODoiMTgvMy84MTUiO3M6NDoiYXRvbSI7czoyMzoiL2Jtai8zNzAvYm1qLm0yNjE0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 36. Sayasneh A, Wynants L, Preisler J, et al. Multicentre external validation of IOTA prediction models and RMI by operators with varied training. Br J Cancer2013;108:2448-54. doi:10.1038/bjc.2013.224 pmid:23674083 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1038/bjc.2013.224&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=23674083&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) [Web of Science](https://www.bmj.com/lookup/external-ref?access_num=000321003700006&link_type=ISI) 37. Meys EMJ, Jeelof LS, Achten NMJ, et al. Estimating risk of malignancy in adnexal masses: external validation of the ADNEX model and comparison with other frequently used ultrasound methods. Ultrasound Obstet Gynecol2017;49:784-92. doi:10.1002/uog.17225 pmid:27514486 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.17225&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=27514486&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 38. Stukan M, Badocha M, Ratajczak K. Development and validation of a model that includes two ultrasound parameters and the plasma D-dimer level for predicting malignancy in adnexal masses: an observational study. BMC Cancer2019;19:564. doi:10.1186/s12885-019-5629-x pmid:31185938 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1186/s12885-019-5629-x&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=31185938&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 39. Joyeux E, Miras T, Masquin I, Duglet PE, Astruc K, Douvier S. [Before surgery predictability of malignant ovarian tumors based on ADNEX model and its use in clinical practice]. Gynecol Obstet Fertil2016;44:557-64. doi:10.1016/j.gyobfe.2016.07.007 pmid:27568408 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.gyobfe.2016.07.007&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=27568408&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 40. Szubert S, Wojtowicz A, Moszynski R, et al. External validation of the IOTA ADNEX model performed by two independent gynecologic centers. Gynecol Oncol2016;142:490-5. doi:10.1016/j.ygyno.2016.06.020 pmid:27374142 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.ygyno.2016.06.020&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=27374142&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 41. Araujo KG, Jales RM, Pereira PN, et al. Performance of the IOTA ADNEX model in preoperative discrimination of adnexal masses in a gynecological oncology center. Ultrasound Obstet Gynecol2017;49:778-83. doi:10.1002/uog.15963 pmid:27194129 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.15963&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=27194129&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 42. Chen H, Qian L, Jiang M, Du Q, Yuan F, Feng W. IOTA ADNEX model for evaluating adnexal masses using data from a gynecologic oncology center in China. Ultrasound Obstet Gynecol2019;54:815-22. doi:10.1002/uog.20363 pmid:31152572 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1002/uog.20363&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=31152572&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom) 43. Andreotti RF, Timmerman D, Benacerraf BR, et al. Ovarian-Adnexal Reporting Lexicon for Ultrasound: A White Paper of the ACR Ovarian-Adnexal Reporting and Data System Committee [correction in: *J Am Coll Radiol* 2019;16:403-06]. J Am Coll Radiol2018;15:1415-29. doi:10.1016/j.jacr.2018.07.004 pmid:30149950 [CrossRef](https://www.bmj.com/lookup/external-ref?access_num=10.1016/j.jacr.2018.07.004&link_type=DOI) [PubMed](https://www.bmj.com/lookup/external-ref?access_num=30149950&link_type=MED&atom=%2Fbmj%2F370%2Fbmj.m2614.atom)