Volume 52, Issue 1 p. 49-60
Research Article
Full Access

Presentation and analysis of data on radiographic outcome in clinical trials: Experience from the TEMPO study

Désirée van der Heijde

Corresponding Author

Désirée van der Heijde

University Hospital, Maastricht, The Netherlands

Drs. van der Heijde, Landewé, Klareskog, Rodríguez-Valverde, and Settas are paid participating investigators for and/or consultants to Wyeth Research, compensated at less than $10,000 per year.

Rheumatology Department, University Hospital, Maastricht, The NetherlandsSearch for more papers by this author
Robert Landewé

Robert Landewé

University Hospital, Maastricht, The Netherlands

Drs. van der Heijde, Landewé, Klareskog, Rodríguez-Valverde, and Settas are paid participating investigators for and/or consultants to Wyeth Research, compensated at less than $10,000 per year.

Search for more papers by this author
Lars Klareskog

Lars Klareskog

Karolinska Institutet, Stockholm, Sweden

Drs. van der Heijde, Landewé, Klareskog, Rodríguez-Valverde, and Settas are paid participating investigators for and/or consultants to Wyeth Research, compensated at less than $10,000 per year.

Search for more papers by this author
Vicente Rodríguez-Valverde

Vicente Rodríguez-Valverde

Hospital M. Valdecilla, Universidad de Cantabria, Santander, Spain

Drs. van der Heijde, Landewé, Klareskog, Rodríguez-Valverde, and Settas are paid participating investigators for and/or consultants to Wyeth Research, compensated at less than $10,000 per year.

Search for more papers by this author
Lucas Settas

Lucas Settas

AHEPA Hospital, University Thessaloniki, Thessaloniki, Greece

Drs. van der Heijde, Landewé, Klareskog, Rodríguez-Valverde, and Settas are paid participating investigators for and/or consultants to Wyeth Research, compensated at less than $10,000 per year.

Search for more papers by this author
Ronald Pedersen

Ronald Pedersen

Wyeth Research, Collegeville, Pennsylvania

Search for more papers by this author
Saeed Fatenejad

Saeed Fatenejad

Wyeth Research, Collegeville, Pennsylvania

Search for more papers by this author
First published: 07 January 2005
Citations: 89

Abstract

Objective

To evaluate different methods of presentation and analysis of radiographic data in a rheumatoid arthritis (RA) randomized controlled trial.

Methods

A double-blind randomized controlled trial including 682 patients with active RA who were treated with methotrexate, etanercept, or a combination of the 2 drugs was used for this study. Probability plots of the change from baseline to year 1 were produced to visualize progression, and were compared with usual descriptive statistics. The primary analysis of the trial (based on annualized actual mean change from baseline in total Sharp score at 1 year, using linear imputation) was challenged using various ways of handling missing information with alternative imputation methods, and by various statistical analyses including analysis of covariance (ANCOVA) and mixed model analysis on both raw and log-transformed data.

Results

Probability plots provided detailed insight into the differentiated treatment effects between the 3 arms of this study. As adjuncts to formal hypothesis testing, these plots were more useful for presenting data than were summary descriptive statistics or use of preset cutoff points to define lack of progression. Additional analyses presented here support the results obtained with the per-protocol analysis that showed an advantage of the combination treatment compared with the monotherapy arms and for etanercept versus methotrexate alone. Various ways of handling missing information confirmed the robustness of the results. In addition, both ANCOVA and mixed model analyses on raw and on log-transformed data produced similar results.

Conclusion

We suggest a panel of alternative analysis methods and alternative ways of handling missing information to verify that the radiographic results reported in an randomized controlled trial are not influenced by technical factors, such as interpolation, handling of missing data, and choice of statistical tests.

Plain radiography has evolved over several decades and is now widely accepted as an appropriate method for assessing joint damage in patients with rheumatoid arthritis (RA) (1-4). It is generally believed that current scoring methodology allows readers to reliably assess the course of the disease and, more specifically, evaluate the effects of drug therapy on joint damage.

Recommendations on how to present radiographic results from clinical trials of RA have been published, based on an international consensus conference (5). One recommendation was to use the total Sharp score (TSS) (3, 4) as the primary end point of a trial, and separately present erosion and joint space narrowing (JSN) scores as secondary end points. Presenting the data on a group level was advised, including both means and medians with appropriate variability statistics, since the two provide complementary information. Another way of reporting radiographic data, suggested by the participants, was to calculate the percentage of patients with progression of joint damage above a certain cutoff level. Recommended cutoff levels for progression were 0 or 0.5 (if the average from 2 readers is used), as well as the smallest detectable difference (SDD) (6), in order to account for measurement error. Specific cutoff levels and percentile scores have the disadvantage that they take into consideration only 1 value from the entire data set. Recently, another format has been proposed to present all individual data from all patients: probability plots (7, 8). These are graphs that illustrate the results in a continuous manner, allowing simple visualization of the coherence of all data. Readers can easily apply any cutoff point they desire.

Despite these recommendations, so far no trial report has included all of these presentation formats. Comparison of treatment arms across trials is recognized as a risky and often invalid exercise (9, 10). One aid for comparing the likelihood of radiographic progression among treatment arms in different trials at baseline is the predicted yearly progression rate (11).

When analyzing data, one is always confronted with missing values. For some patients no radiographic data may be available, or data from 1 patient at 1 or more time points can be missing. Although radiographic damage is considered a largely irreversible process and progression is approximately linear at the group level (12, 13), a high level of variability in individual patients (14) implies that the way missing data are handled may have a major influence on the results of a trial.

Another issue, which is much more relevant in the analysis of radiographic data compared with clinical data, is the high level of skewness of radiographic data: a large proportion of patients show no or only minimal progression over the period of the trial, and only a minority show major progression. This makes the use of parametric statistical testing inappropriate, and other techniques, such as nonparametric testing or logarithmic transformation of the data, should be sought. Moreover, classic statistical testing of radiographic progression uses only 2 time points, whereas use of all available longitudinal data (e.g., 3 radiographs instead of only the baseline and end point radiographs) may provide a better estimate of the progression rate and improve statistical power (15).

We recently published the results of a phase III trial comparing etanercept, methotrexate, and the combination of the two in the treatment of patients with RA (16). Compared with the progression rate at baseline, all 3 treatment arms were shown to be effective in slowing disease progression. However, the combination was significantly better than either of the 2 monotherapy arms (16). Moreover, the data from the combination treatment group suggested that repair of damage may have occurred in this group. This 3-arm trial provided an ideal resource for comparing different means of presenting data and for testing the effects of various ways of handling missing data and a range of statistical analyses, as well as exploring issues about radiographic repair.

PATIENTS AND METHODS

Patients.

The primary aim of this study was to investigate different ways of presenting and analyzing radiographic data in the context of a randomized controlled trial. We therefore used the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO trial), comparing etanercept alone (25 mg subcutaneously twice weekly), methotrexate alone (7.5 mg escalated to 20 mg orally once weekly within 8 weeks), or the combination of the two in patients with active RA who had been previously treated unsuccessfully with a disease-modifying drug other than methotrexate (16). A list of the TEMPO investigators is shown in Appendix A.

Radiographic scoring.

Radiographs of hands, wrists, and feet were obtained at baseline, 6 months, and 1 year (or at the final study visit, provided that this visit was more than 1 month after the visit with the previous radiograph), using high-resolution films. Radiographs were digitized at 100μ per pixel, processed, and randomly assigned for presentation to the readers of the images by a third and independent party (Bio-Imaging Technologies, Leiden, The Netherlands). Images were scored using the modified Sharp/van der Heijde method (3, 4). Scores for erosions and JSN were summed to yield the TSS. The possible range of the TSS was 0–448.

Image sets were masked with regard to treatment and sequence and scored by pairs of trained readers formed by combining readers as follows: AB, AC, and BC. Images were presented and scores were recorded using a computerized randomization system. To investigate inter- and intrareader agreement, radiographs of 5% of the patients (n = 34) were read by all readers twice during the study.

Presentation of data.

Continuous data, including erosion scores, JSN scores, and TSS, were presented as means ± SD and as medians with interquartile ranges. Data were dichotomized using different cutoff levels for progression based on TSS. The cutoff levels were arbitrarily chosen (TSS change ≤−3.0, ≤−0.5, ≤0.0, ≤0.5, ≤3.0); we also used the concept of the smallest detectable change (SDC) (SDD divided by square root of 2) to define a cutoff level at 6 months and 1 year. The SDC was calculated based on the change from baseline in TSS at each time point. The values were examined for each reader pair and found to be similar (4.1, 4.6, and 4.5); therefore, the SDC based on the average of the 3 readers was used (4.4 TSS units). Probability plots were used as new methods to present radiographic data. As described elsewhere (7), a cumulative probability plot is a frequency distribution of the observed cumulative proportion (scores ranked from the lowest to the highest values, and presented as a cumulative proportion of all scores) on the x-axis against the variable's actual values. From the cumulative probability, the proportion of observations with a value less than the value of the corresponding variable can be read on the x-axis. Probability plots were constructed separately for TSS, erosions, and JSN. Another method we used to present radiographic data was predicted yearly progression rate, which is defined as the TSS at baseline divided by disease duration, and calculated per patient.

Analysis of missing data.

In this study, the impact of data imputation on primary trial results was investigated using different techniques. The primary analysis included only patients with on-therapy readings (the radiographic intent-to-treat [ITT] analysis). If patients had discontinued treatment early, the rate of progression was extrapolated to impute 1-year values (linear imputation). The robustness of the primary analysis outcomes was tested in the full ITT population by imputing the mean baseline value over all 3 treatment groups and the mean response over all groups to any patient not included in the radiographic ITT analysis. In an additional analysis, the value of the 95th percentile of the observed change score for each group was assigned to any patient not included in the radiographic ITT analysis.

The impact of imputing 1-year values obtained by extrapolation, in patients who had baseline (with or without 6-month followup) radiographs and had discontinued early, was evaluated by comparing this technique with imputation by the last observation carried forward (LOCF) approach. This approach treats dropouts as if they had no further disease progression.

A valid-for-efficacy (VFE) analysis excluded patients who did not qualify according to clinical VFE guidelines and did not have radiographs performed within 15 days after the baseline visit. A last analysis was performed including only those for whom baseline and 1-year radiographic data were available (“completers”).

Statistical analysis.

Statistical analysis aims at testing the hypothesis that 2 or more treatment groups show similar radiographic progression (null hypothesis). The choice of the statistical method may improve the statistical power.

The primary analysis in the TEMPO trial included comparisons of change between baseline and the 1-year end point for the combination treatment versus methotrexate and for etanercept versus methotrexate (using Hochberg's approach [17] for multiple comparisons). The primary end point, change in total Sharp score, was analyzed using an analysis of covariance (ANCOVA) model on the ranks of change (average of 2 readers) with factors for prior methotrexate use, study center, and treatment, and the rank of the baseline score as covariate. Statistical tests were 2-sided. P values less than 0.05 were considered significant.

As a second type of statistical analysis, a mixed effects model based on the TSS was performed in order to further evaluate the progression pattern. Mixed effects models allow longitudinal interpretation (development over time) of the data because they use all time points instead of only 1 predefined end point. Under certain conditions, mixed effects models provide greater statistical power. The model used here was a random coefficient regression model with subjects considered random and time measured as a continuous variable. This model fits an intercept and slope for each patient's damage score over time. These data were assumed to be multivariate-normal. Both the ANCOVA and mixed effects models for the radiographic data were fit to raw scores, as well as after logarithmic transformation.

RESULTS

Patient demographics.

Six hundred eighty-two patients were randomly assigned to receive methotrexate (n = 228), etanercept (n = 223), or the combination (n = 231). Of these, 648 patients were included in the radiographic ITT population (Table 1), and for 34 patients, no radiographic data were available. Baseline TSS, erosion scores, JSN scores, and estimated yearly progression rate were comparable among the 3 treatment arms and showed values consistent with a population at high risk for further progression.

Table 1. Demographic and baseline disease characteristics of patients in the radiographic intent-to-treat population
Methotrexate (n = 214) Etanercept (n = 213) Etanercept and methotrexate (n = 221)
Age, mean ± SD years 52.8 ± 12.5 53.1 ± 13.8 52.5 ± 12.4
Disease duration, mean ± SD years 6.8 ± 5.6 6.2 ± 5.1 6.6 ± 5.2
Sharp score, mean ± SD
  median (25th–75th percentiles)*
 Total 48.2 ± 57.7/26.8 43.8 ± 60.7/21.8 44.1 ± 52.7/21.8
 (5.5–70.5)*  (7.5–58.6)  (5.5–61.6)
 Erosion 24.0 ± 31.1/11.5 21.3 ± 35.6/8.0 21.6 ± 27.7/9.5
 (2.5–35.5)  (2.8–26.0)  (2.5–30)
 Joint space narrowing 24.2 ± 28.6/13.3 22.5 ± 27.9/11.5 22.5 ± 27.5/10.3
 (2.0–37.0)  (2.0–30.2)  (2.0–35.5)
Estimated yearly rate of progression in 10.3 ± 16.6 11.0 ± 25.2 8.4 ± 11.9
 total sharp score, mean ± SD
  • * In the methotrexate and etanercept groups, n = 212; in the combination treatment group, n = 218.

Of this radiographic ITT population, 6 patients (2 in the methotrexate group, 1 in the etanercept group, and 3 in the combination group) had baseline radiographs that were inadequate for complete scoring by the readers and thus excluded from the analysis. As a result, the primary radiographic analysis included data from 642 patients (n = 218, 212, and 212 in the combination treatment, etanercept, and methotrexate groups, respectively). There were 37 patients in the methotrexate group, 27 patients in the etanercept group, and 18 patients in the combination group who had 1 or more missing radiographs at 1 year.

Reliability of radiographic scoring.

Interreader variability and intrareader variability, as assessed by intraclass correlation coefficient and based on status scores, ranged from 0.85 to 0.98 and from 0.90 to 0.99, respectively.

Comparison of results with different methods of analysis and presentation.

Some of the descriptive statistics on the radiographic ITT population from this trial were presented previously (16) and are integrated in Tables 2 and 3. They serve here as a benchmark to compare with the analysis of missing data and the additional statistical modeling of the data. It is obvious from both parametric and nonparametric statistics that the statistical analysis on the radiographic ITT population showed a clear contrast between groups.

Table 2. Percentage of patients with no progression at 1 year, determined using different cutoff levels*
Cutoff Methotrexate (n = 212) Etanercept (n = 212) Etanercept and methotrexate (n = 218)
TSS ≤ −SDC (4.4) 3.8 5.2 8.7
TSS ≤ −3.0 6.6 8.0 12.8
TSS ≤ −0.5 29.2 33.0 49.5
TSS ≤ 0.0 54.7 62.7 76.1
TSS ≤ 0.5 57.1 67.9§ 79.8
TSS ≤ 3.0 77.4 87.3 94.5
TSS ≤ SDC (4.4) 82.5 90.6§ 95.4
  • * TSS = total Sharp score; SDC = smallest detectable change.
  • P < 0.01 versus methotrexate group.
  • P < 0.01 versus etanercept group and versus methotrexate group.
  • § P < 0.05 versus methotrexate group.
Table 3. Sensitivity analysis of change in the total Sharp score from baseline to 1 year*
Analysis Methotrexate Etanercept Etanercept and methotrexate
ITT 2.80 (1.08–4.51) 0.52 (−0.10, 1.15) −0.54 (−1.00, −0.07)
(n = 212) (n = 212) (n = 218)§
All patients included
 Mean of 3 groups 2.67 (1.07–4.26) 0.54 (−0.05, 1.14) −0.46 (−0.90, −0.01)
(n = 228) (n = 223) (n = 231)§
 95th percentile of 3.13 (1.53–4.73) 0.87 (0.24, 1.49) −0.08 (−0.59, 0.42)
  observed data (n = 228) (n = 223) (n = 231)§
 LOCF 2.06 (0.87–3.26) 0.38 (−0.17, 0.93) −0.73 (−1.09, −0.38)
(n = 212) (n = 212) (n = 218)§
VFE population 2.72 (0.91–4.53) 0.41 (−0.26, 1.07) −0.65 (−1.03, −0.26)
(n = 169) (n = 174) (n = 180)§
Completers 1.93 (0.63–3.23) 0.33 (−0.28, 0.94) −0.86 (−1.23, −0.49)
(n = 175) (n = 185) (n = 200)§
  • * Values are the mean (95% confidence interval). ITT = intent-to-treat; LOCF = last observation carried forward; VFE = valid-for-efficacy.
  • Primary end point.
  • P < 0.05 versus methotrexate group.
  • § P < 0.01 combination versus etanercept group and versus methotrexate group.

Table 2 further shows the proportion of patients in each treatment group who had no radiographic progression, as defined by different cutoff levels. With every cutoff level, a contrast between groups (between-group difference in percentage of patients with no progression) could be demonstrated, but the magnitude of the contrast was clearly and expectedly susceptible to ceiling effects. As an example, the greatest difference in 1-year progression between the methotrexate group and the combination group was found with a cutoff level of 0.5 TSS units (22.7%). The same contrast was only 12.9% if the SDC (i.e., 4.4) was chosen as the cutoff level, because of ceiling effect. Floor effects, which are actually the opposite of ceiling effects, were also observed when using negative cutoff levels. The contrast between methotrexate and the combination of methotrexate and etanercept was only 6.2% if a cutoff level of –3.0 TSS units was chosen.

These and other phenomena can be more clearly visualized with the use of probability plots (Figures 1a and b). The probability plots show every individual radiographic score per treatment group plotted against its cumulative probability (from the lowest to the highest observation). The plots in Figure 1a show that the majority of observations in all 3 groups had values close to 0. The plot representing methotrexate, however, clearly lies to the left as compared with both other curves, which means that there were more positive change observations (increase in radiographic abnormalities) in the methotrexate group. It is also obvious that the methotrexate group contained more positive extremes that had major impact on mean change scores (9). Because of these extreme values, the plots are presented here using the entire proportional scale but also using an interrupted y-axis (Figure 1b), allowing focus on a narrow range (from −20 to 20) containing >95% of all scores.

Details are in the caption following the image

Cumulative probability distribution of total Sharp score over 1 year of treatment with methotrexate, etanercept, or the combination of methotrexate and etanercept. Data are shown without (a) and with (b) a break in the y-axis.

Probability plots also shed light on the presence of negative scores. Negative scores occurred in all 3 groups. Potentially, negative scores can result from measurement error or from repair (see below). A similar pattern of change as described above for TSS was seen for the erosion scores (Figures 2a and b) and JSN scores (Figures 3a and b).

Details are in the caption following the image

Cumulative probability plot of erosions over 1 year of treatment with methotrexate, etanercept, or the combination of methotrexate and etanercept. Data are shown without (a) and with (b) a break in the y-axis.

Details are in the caption following the image

Cumulative probability plot of joint space narrowing over 1 year of treatment with methotrexate, etanercept, or the combination of methotrexate and etanercept. Data are shown without (a) and with (b) a break in the y-axis.

Comparison of results with different methods of analyzing missing data.

The primary analysis of the radiographic data was based on the 642 patients included in the ITT population (see above). We here assessed the effect of missing data on the overall outcomes of the study by assigning a value to the 40 patients who were not included in the primary radiographic analysis. These values were assigned by using the mean of the 3 treatment groups, or the 95th percentile of the observed data (Table 3). As expected, the mean change in scores per group was sensitive to imputation of values. Also, the contrast between groups was, to some extent, sensitive to imputation. Generally, the contrast somewhat diminished, irrespective of the means of imputation, but this was only minor and did not affect the results of the statistical comparisons between groups.

Comparison of results with different statistical analyses.

In general, the mixed model analysis resulted in lower progression scores compared with the primary trial analysis. In comparison with the primary trial analysis, the mixed model analysis affected the mean value of radiographic progression more in the methotrexate group than in the etanercept and combination groups.

The primary trial analysis demonstrated a clear difference in between-group contrast for every possible treatment combination in the trial. We tested whether different results were obtained with logarithmic transformation of the data as well as longitudinal analysis including 6-month radiographic data (Table 4). Comparing the actual values from the primary trial analysis with the modeled data from the mixed model analysis, it can be derived from Table 4 that the observed treatment contrast decreased from 2.28 with the primary analysis to 1.73 with the mixed model for the methotrexate versus etanercept treatment comparison, and from 3.34 to 3.08 for the methotrexate versus combination treatment comparison. The 95% confidence interval for progression in the methotrexate group was, importantly, narrowed by the mixed model analysis, but the 95% confidence intervals in the etanercept group and the combination treatment group were approximately similar with both methods. As a consequence, the statistical comparison of the between-group contrasts in this trial resulted in approximately the same P values. Comparison of analyses performed with log-transformed data resulted in the same effects as comparison of analyses performed with actual data.

Table 4. Random coefficients mixed effects model: change in total Sharp score from baseline to 1 year, ITT population*
Analysis Methotrexate (n = 212) Etanercept (n = 212) Etanercept and methotrexate (n = 218)
Mixed model
 Raw values 2.10 (1.44, 2.77) 0.37 (−0.28, 1.02) −0.98 (−1.60, −0.36)
 Log-transformed 0.08 (0.05, 0.11) 0.02 (−0.01, 0.05) −0.04 (−0.06, −0.01)
Change scores
 Raw values 2.80 (1.08, 4.51) 0.52 (−0.10, 1.15)§ −0.54 (−1.00, −0.07)
 Log-transformed 0.08 (0.04, 0.13) 0.03 (−0.003, 0.06)§ −0.03 (−0.06, −0.007)
  • * Values are the mean (95% confidence interval). ITT = intent-to-treat.
  • P < 0.01 versus methotrexate group.
  • P < 0.01 versus etanercept group and versus methotrexate group.
  • § P < 0.05 versus methotrexate group.
  • P < 0.05 versus etanercept group; P < 0.01 versus methotrexate group.

Repair.

Recently, we reported statistically significant negative radiographic progression between baseline and 1 year in the combination treatment group in this trial (16) (Figure 4). With the conditions for reading of radiographs used in the trial (blinding with regard to treatment and time sequence), this negative progression could not be explained by measurement error alone. Therefore, the possibility that repair of previously existing damage had occurred should be considered. In this study, the negative mean progression in the combination treatment group could be effaced only with the most conservative means of handling missing data, i.e., imputation of the assessment representing the 95th percentile of the entire data set. Mixed model analysis further confirmed the negative progression score in the combination methotrexate and etanercept group.

Details are in the caption following the image

Total Sharp score (TSS) (a), erosion score (b), and joint space narrowing (JSN) (c) at 6 months and at 1 year, in patients treated with methotrexate, etanercept, or the combination of methotrexate and etanercept. † = P < 0.05 versus methotrexate; ‡ = P < 0.05 versus etanercept.

DISCUSSION

We report herein the radiographic results of a randomized, controlled clinical trial (TEMPO) analyzed and presented according to all of the recommendations that have been made in the literature during the last few years (5). Of the various presentation formats, probability plots provide the most comprehensive information, allowing readers to deduce the information from the plot in which they are interested. The presentation of the exact numbers for means, SD, medians, interquartile ranges, and percentages of patients with progression below several cutoffs can be helpful for the power calculation and planning of future trials. The estimated yearly progression rate (used with caution) can serve as a benchmark for comparison of likelihood of radiographic progression with findings in other trials.

Presenting not only the TSS but also the erosion and JSN scores provides better insight into the effects of the drugs under study. In this trial, as well as in most others, the treatment effects on TSS, erosion score, and JSN score were similar.

Despite major efforts to obtain data as completely as possible, the reality is that there will always be missing data. The worst situation is missing all radiographs from a particular patient, but missing 1 or more followup radiographs also challenges the analysis of the results. If patients and/or radiographs are missing in a way that is completely random, there will be no significant influence on the treatment effect. But the probability of missing data is usually higher in patients with adverse events and in patients with lack of efficacy, and data from the latter play the strongest role in treatment contrast (selective missing) (9).

The issue of missing data is even more important for comparisons of radiographic data with clinical data. There are 2 reasons for this. First, only a small number of patients, in particular those with lack of clinical efficacy, show substantial progression, and these patients drive the mean progression score (9). Second, although progression in groups of patients is often linear, this is rarely the case in individual patients, and missing radiographs imply that the true pattern of progression cannot be estimated appropriately. In the TEMPO trial, only 6% of the patients had no radiographic data for analysis. Applying various ways to impute missing data for these patients did not alter the results, even including imputation with 95th percentile progression scores of the same treatment group, which was the most conservative approach we applied. Another way of analyzing the data is to define the patient groups differently, e.g., only those patients who completed the trial or (even stricter) only those who completed the trial without protocol violations (VFE). Although the VFE analysis included the fewest patients, the results most closely resembled those from the primary analysis. The completers analysis demonstrated reduced progression rates to the largest extent in all 3 groups, and this can be easily understood given that patients with the least clinical efficacy—and consequently the highest progression—will preferentially discontinue the trial.

In 560 of the 682 patients (82%), all radiographs were available. In the remaining 18% of the patients, 1 or more followup radiographs were not obtainable. In the primary analysis, missing data were replaced by linear extrapolation of the change scores. This has been compared with the LOCF approach, which by definition underestimates the progression in all groups. It can be seen that this has the greatest implications for the group with the highest progression rate (in this trial the methotrexate group), again because of preferential withdrawal from the trial by patients who do not respond clinically.

The number of patients with missing radiographs in the TEMPO trial is relatively low. None of the different ways to handle these missing data altered the results of the primary analysis, which adds to the validity of the trial results. But even with a small proportion of patients and/or radiographs missing, there is a significant effect on group means. Undoubtedly, trial results become more sensitive to missing data if a greater proportion of the radiographic data is missing. In the past, there have been published reports of trials with radiographic end points in which up to 50% of radiographs were missing. The results of such trials should be considered with skepticism (18).

The results of radiographic analyses in clinical trials are specifically prone to confounding by unwanted effects, because of the well-known relationship between disease activity and radiographic progression and the fact that drug trials primarily focus on improving disease activity. Unexpected selective (preferentially in 1 group) dropout due to lack of clinical efficacy may easily bias the trial results, and therefore it is highly recommended that several types of sensitivity analyses be performed in order to test the robustness of the trial results, as we did here.

An additional type of analysis that can add to the robustness of trial data is the longitudinal analysis. Longitudinal analysis of trial data is not often published in the primary trial report. This may be less than optimal because using all available data instead of only baseline and end point data may substantially improve the estimate of radiographic progression and narrow the 95% confidence interval of progression in 1 or more trial arms, thus improving statistical power, as shown in this trial by the mixed model analysis. Apart from power issues, longitudinal analysis may confirm—or disqualify—time trends and thus corroborate specific observations. A typical example in this study is the demonstration of repair that was first made plausible by classic statistical analysis and was subsequently confirmed by mixed model analysis.

The occurrence of repair was considered plausible here because the mean progression rate in the combination treatment group was negative, with the entire 95% confidence limit below zero. This means that the null hypothesis of zero progression in this group can be rejected, taking into account the fact that radiographs were read with masking of the time order. Before the data from this trial became available as the first ever to show this effect, some of us had already recommended using this criterion to assess repair on a group level (8). It should be noted that individual cases of repair may also have occurred in the other 2 treatment groups (as suggested by the presence of negative scores in individuals in these groups) and in other trials (19-22). The mean effects, however, were not sufficiently large to exclude measurement error as the primary source of negative scores. Inclusion of zero in the confidence interval therefore does not rule out occurrence of repair in individual patients or individual joints. For an individual patient, however, the relationship between the true negative score and measurement error is unknown. It can be hypothesized that longer followup of these patients, again showing negative scores in the following period, could shed light on these individual cases.

In conclusion, using radiographic data from the TEMPO trial, this analysis has demonstrated that use of various methods of presenting radiographic data adds important information and allows for more complete interpretation of the data. The probability plot appears to be especially helpful in this regard. In addition, handling of missing data and use of appropriate statistical techniques are important issues in analyzing the radiographic results of a clinical trial. If several different types of analysis point to the same results, as in the TEMPO trial, this may increase the credibility of the data, especially if new phenomena such as repair are under consideration. This trial showed that the demonstration of repair on a group level is possible. Assessment of repair in an individual patient, however, remains a challenge.

Acknowledgements

The authors are grateful to the following individuals at Wyeth Research: R. Lavielle, MD (Paris, France), P. Loeschmann, MD (Munster, Germany), D. Simon, MD (Zug, Switzerland), G. de Crescenzo, MD (Aprilia, Italy), T. Nash, MD (Baulkham, Australia), J.-M. Kroodsma, MD (Hoofddorp, The Netherlands), A. Zareba, MD (Warsaw, Poland), R. Murillo, MD (Madrid, Spain), and G. Skoglund, MD (Solna, Sweden). D. Simcoe, MS in the Wyeth Clinical Writing Group is acknowledged for her writing support.

    Appendix: APPENDIX A: TEMPO INVESTIGATORS

    The TEMPO study investigators, in addition to the authors of this article, were as follows: H. D. Bolosiu, MD (Cluj-Napoca, Romania), P. Bourgeois, MD (Paris, France), J. Cunha Branco, MD (Lisbon, Portugal), J. Braun, MD (Herne, Germany), J. Broell, MD (Vienna, Austria), G. Bruyn, MD (Leeuwarden, The Netherlands), J. Brzezicki, MD (Elblag, Poland), G. Burmester, MD (Berlin, Germany), B. Canesi, MD (Milan, Italy), A. Cantagrel, MD (Toulouse, France), X. Chevalier, MD (Creteil, France), H. Chwalinska-Sadowska, MD (Warsaw, Poland), Leslie Cleland, MD (Adelaide, Australia), C. Codreanu, MD (Bucharest, Romania), L. Coster, MD (Linkoping, Sweden), M. Cutolo, MD (Genoa, Italy), R. Dahl, MD (Uppsala, Sweden), P. Dawes, MD (Stoke-on-Trent, UK), J. Dehais, MD (Bordeaux, France), D. J. de Rooij, MD (Nijmegen, The Netherlands), J. P. Devogelaer, MD (Brussels, Belgium), D. Doyle, MD (London, UK), L. Euller-Ziegler, MD (Nice, France), F. Fantini, MD (Milan, Italy), G. Ferraccioli, MD (Udine, Italy), A. Filipowicz-Sosnowska, MD (Warsaw, Poland), S. Freiseleben-Sorensen, MD (Copenhagen, Denmark), P. Geusens, MD (Diepenbeek, Belgium), J. Melo Gomes, MD (Lisbon, Portugal), J. J. Gómez Reino, MD (Santiago de Compostela, Spain), A. Gough, MD (North Yorkshire, UK), J. P. de Jager, MD (Southport, Australia), M. Janssen, MD (Arnhem, The Netherlands), H. Julkunen, MD (Vantaa, Finland), R. Juvin, MD (Grenoble, France), H. Haentzschel, MD (Leipzig, Germany), G. Herrero-Beaumont, MD (Madrid, Spain), W. Hissink-Muller, MD (Tilburg, The Netherlands), J. Kalden, MD (Erlangen, Germany), C. Kaufmann, MD (Drammen, Norway), J. Kekow, MD (Vogelsang, Germany), A. Harju, MD (Stockholm, Sweden), T. Kvien, MD (Oslo, Norway), A. Laffón, MD (Madrid, Spain), H. Lang, MD (Plauen, Germany), P. Lanting, MD (Doetinchem, The Netherlands), X. Le Loet, MD (Rouen, France), B. Lindell, MD (Kalmar, Sweden), F. Liote, MD (Paris, France), R. Luukkainen, MD (Satalinna, Finland), M. Malaise, MD (Liege, Belgium), X. Mariette, MD (Le Kremlin Bicetre, France), E. Martín Mola, MD (Madrid, Spain), B. Masek, MD (Venlo, The Netherlands), Z. Mencel, MD (Kalisz, Poland), O. Meyer, MD (Paris, France), Y. Molad, MD (Petah-Tikva, Israel), C. Montecucco, MD (Pavia, Italy), R. Myllykangas-Luosujarvi, MD (Kuopio, Finland), H. Nielsen, MD (Herlev, Denmark), G. Papadimitriou, MD (Athens, Greece), K. Pavelka, MD (Prague, Czech Republic), A. Perniok, MD (Cologne, Germany), F. Radulescu, MD (Bucharest, Romania), A. Rubinow, MD (Jerusalem, Israel), P. Sambrook, MD (St. Leonards, Australia), R. Sanmartí, MD (Barcelona, Spain), J. Sany, MD (Montpellier, France), A. Saraux, MD (Brest, France), M. Schattenkirchner, MD (Munich, Germany), U. Serni, MD (Florence, Italy), J. Sibilia, MD (Strasbourg, France), H. Stehlikova, MD (Ceska Lipa, Czech Republic), J. Szechinski, MD (Wroclaw, Poland), J. Szerla (Krakow, Poland), C. Tanasescu (Bucharest, Romania), H. Tony, MD (Wurzburg, Germany), J. Tornero, MD (Guadalajara, Spain), S. Transo, MD (Jonkoping, Sweden), F. Trotta, MD (Ferrara, Italy), T. Tuomiranta, MD (Tampere, Finland), E. Veys, MD (Ghent, Belgium), G. Valentini, MD (Naples, Italy), H. van den Brink, MD (Alkmaar, The Netherlands), M. van de Laar, MD (Enschede, The Netherlands), P. Vitek, MD (Zlin, Czech Republic), C. Voudouris, MD (Thessaloniki, Greece), R. Westhovens, MD (Leuven, Belgium), and R. Zahora, MD (Terezin, Czech Republic).

    The evaluators of the radiographic images were A. van Everdingen, MD (Deventer, The Netherlands), T. H. M. Schoonbrood, MD (University Hospital, Maastricht, The Netherlands), and K. Bruynesteyn, MD (University Hospital, Maastricht, The Netherlands).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.

      click me