Effects of community-deliverable exercise on pain and physical function in adults with arthritis and other rheumatic diseases: A meta-analysis†
The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Abstract
Objective
To use the meta-analytic approach to determine the effects of community-deliverable exercise on pain and physical function in adults with arthritis and other rheumatic diseases (AORD).
Methods
Data sources consisted of 6 electronic databases, cross-referencing from retrieved studies and expert review. Study selection included 1) randomized controlled trials; 2) ≥1 exercise intervention group; 3) community-deliverable exercise interventions ≥4 weeks in duration; 4) control group; 5) adults ages ≥18 years with rheumatoid arthritis, osteoarthritis, fibromyalgia, lupus, gout, or ankylosing spondylitis; 6) published and unpublished studies; 7) studies published in any language between January 1, 1980, and January 1, 2008; and 8) data available for pain and/or physical function. Data abstraction included dual coding by 2 of the authors. Standardized effect sizes (g) and random-effects models were used to pool pain and physical function outcomes. Data were analyzed according to per-protocol and intent-to-treat (ITT) results. The minimally clinically important difference (MCID) and number needed to treat (NNT) were also calculated.
Results
Thirty-three studies representing 3,180 men and women (1,857 exercise, 1,323 control) with rheumatoid arthritis, osteoarthritis, and fibromyalgia were included. Statistically significant and clinically important improvements were observed for pain (per-protocol g = −0.37 [95% confidence interval (95% CI) −0.53, −0.21], MCID −18%; ITT g = −0.20 [95% CI −0.33, −0.07], MCID −9%, NNT 9) and physical function (per-protocol g = 0.37 [95% CI 0.21, 0.52], MCID 15%; ITT g = 0.34 [95% CI 0.25, 0.43], MCID 10%, NNT 5).
Conclusion
Community-deliverable exercise improves pain and physical function in adults with the types of AORD included in the analysis. Dose-response as well as studies in those with other types of AORD is needed.
INTRODUCTION
Arthritis, a term used to describe more than 100 rheumatic diseases and conditions, is a major public health problem (1). Approximately 46 million adults (22%) ages 18 years and older have self-reported doctor-diagnosed arthritis (2), with the prevalence expected to increase to more than 67 million by 2030 (3). Not surprisingly, the financial costs associated with arthritis are high. For example, the total costs attributed to arthritis and other rheumatic diseases (AORD) in the US have been estimated to be $128 billion in 2003, an increase from $86.2 billion in 1997 (4).
Arthritis is the most common cause of disability among adults (5), primarily the result of the pain and stiffness that occurs in one or more joints (1). One potential nonpharmacologic approach for reducing pain and increasing physical function in adults with AORD is exercise. Exercise programs that address joint pain, muscle weakness, and mobility limitations exist and are generally safe and appropriate for most persons with various types of AORD (6, 7). Several previous meta-analyses and one attempted meta-analysis have reported beneficial effects of exercise on pain and/or physical function in adults with specific types of AORD such as osteoarthritis, rheumatoid arthritis, and fibromyalgia (6-12). Although the previously published meta-analyses (6-12) provided valuable and encouraging information, they focused on specific types of AORD. There are at least 4 reasons why such an approach may not be optimal from a population health perspective. First, while the clinical care of AORD does vary by the specific condition, there is no convincing evidence to suggest that exercise programs delivered in the community need to be disease specific or tailored to a specific type of arthritis. In support of this concept, the Centers for Disease Control and Prevention (CDC) public health surveillance and intervention programs encompass all AORD and are not disease specific (13). Second, previous meta-analyses included studies of interventions that cannot be feasibly implemented in a community setting, a factor that is critical to promoting widespread dissemination through a public health approach. For example, interventions that require highly trained and/or educated instructors such as physical therapists or nurses to conduct the program or require membership at a fitness center are too costly, demand extensive resources, and would be difficult for community-based organizations to implement. Third, a prior meta-analysis that did consist of participants with different types of AORD included both randomized and nonrandomized controlled trials as well as pre-post studies limited to a single exercise group (14). Including nonrandomized trials in meta-analyses may be problematic because unknown confounders are not controlled for and designs of this type can overestimate the effects of health care interventions (15, 16). Last, it was not always clear whether results from per-protocol and/or intent-to-treat (ITT) analyses were pooled (17). Per-protocol analysis considers data only from those who comply with the intervention and that answer the question, “How well does the intervention work in those who actually do it?” In contrast, ITT analysis counts all members of their originally assigned group regardless of whether they complied with the intervention or dropped out of the study. Therefore, ITT analysis answers the question, “Does the treatment work in the real world?” This differentiation is important when making public health decisions regarding the investment of resources for the promotion of various interventions.
Given the potential public health benefit of exercise across all types of AORD as well as the need to differentiate between per-protocol and ITT analyses, the purpose of this study was to use the meta-analytic approach to determine the effects of community-deliverable exercise (aerobic, strength, or both) on pain and physical function in adults with AORD.
MATERIALS AND METHODS
Data sources.
Studies were retrieved by 1) searching 6 different electronic databases (PubMed, EMBase, Cochrane Central Register of Controlled Clinical Trials, CINAHL, SportDiscus, and Dissertation Abstracts International); 2) cross-referencing from retrieved studies, including review articles; and 3) having an expert review the reference list for thoroughness and completeness (Nelson M: unpublished observations). All of the electronic database searches were conducted by the second author (KSK) with the assistance of the first author (GAK). While the keywords and combination of keywords varied depending upon the database being searched, terms common to all searches were “exercise,” “arthritis,” “rheumatic,” “rheumatoid arthritis,” “osteoarthritis,” “fibromyalgia,” “lupus,” “gout,” and “ankylosing spondylitis.” A list of the user queries for each database is shown in Supplementary Appendix A (available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a), whereas a flow diagram describing the search process is shown in Figure 1. All of the studies were stored electronically using Reference Manager, version 11.0.1 (18).

Flow diagram describing the search process.
Study selection.
The inclusion criteria for this study were: 1) randomized controlled trials with the unit of assignment at the participant level; 2) an exercise-only intervention group (aerobic, strength training, or both); 3) community-deliverable exercise interventions ≥4 weeks in duration; 4) a comparative control group (nonintervention, usual care, attention control); 5) adults ages 18 years and older with either rheumatoid arthritis, osteoarthritis, fibromyalgia, lupus, gout, or ankylosing spondylitis; 6) published and unpublished studies (master's degrees and dissertations); 7) studies published in any language between January 1, 1980, and January 1, 2008; and 8) data available for pain and/or physical function.
For this study, community-deliverable exercise interventions were considered to be those that could be performed, or had the potential to be adapted and performed, by persons in a community setting (recreation or senior centers, in the home or neighborhood, etc.), and generally met the following implementability guidelines for physical activity and self-management education interventions recently proposed by the CDC: 1) no academic degree required for a leader/implementer but leader training available, if needed; 2) no special facilities beyond a community room (except a warm pool for aquatic exercise); 3) inexpensive equipment; 4) cost to participants less than $50.00; 5) implementation guide available; and 6) supporting structures judged to be adequate to support widespread implementation (19). Pain was operationalized as the unpleasant sensation associated with the participants' AORD, while physical function was operationalized broadly as self-reported or measured performance of specific functional tasks (walking, rising from a chair, etc.). Aerobic and progressive resistance (strength) exercise were defined according to the recent Physical Activity Guidelines Advisory Committee report (20), i.e., “exercise that primarily uses the aerobic energy-producing systems and can improve the capacity and efficiency of these systems, and is effective for improving cardiovascular endurance” (aerobic exercise) and “exercise training primarily designed to increase skeletal muscle strength, power, endurance, and mass” (resistance training). While no exact cut point exists, an exercise duration of at least 4 weeks was chosen as our cut point based on an exercise physiology concept that this is generally the minimum length of time needed for chronic changes in various fitness parameters to occur.
For those studies in which primary outcome information was needed, contact was made with the investigators via e-mail in an attempt to retrieve such. The selection of studies was conducted by the first two authors (GAK, KSK). Using Cohen's kappa statistic (21), the overall agreement rate (yes/no) prior to adjudication was 0.89.
Data abstraction.
Prior to coding all of the studies, a codebook was developed in Microsoft Excel 2003 (22). The major categories that were coded for included 1) study characteristics, 2) subject characteristics, 3) exercise program characteristics, and 4) outcomes (e.g., changes in pain and physical function). All of the studies were coded by the first two authors (GAK, KSK), independent of each other, who then reviewed every item for accuracy and consistency. Disagreements were resolved by consensus. When consensus could not be reached, the other two authors (JMH, DLJ) served as arbitrators. Using Cohen's kappa statistic (21), the overall agreement rate (yes/no) prior to correcting discrepant items was 0.94.
The original study protocol included an examination of study quality using a previously developed instrument (23). However, since the time that the original study protocol was developed, the use of quality scales has been discouraged by the Cochrane Collaboration because of the lack of empirical evidence (16, 24), including validity (25), to support the use of such. Therefore, the study protocol was revised in favor of the risk of a bias assessment tool recently recommended by the Cochrane Collaboration (26). This tool assesses bias across 6 domains: 1) sequence generation, 2) allocation concealment, 3) blinding to group assignment, 4) incomplete outcome data, 5) selective outcome reporting, and 6) other potential bias (26). Each domain is classified as having either a high, low, or unclear risk of bias (26). The decision rule for blinding was that participants, research personnel, and outcome assessors were blinded to the primary outcomes of interest, i.e., pain and physical function. While the investigative team acknowledges that it is difficult to blind participants in exercise intervention studies, the risk of potential bias still exists. The determination of selective reporting was limited to pain and physical function, the primary outcomes of the study. Risk of bias was also assessed with respect to whether the participants had been participating in a regular program of physical activity prior to enrollment. All of the assessments were conducted by the first two authors (GAK, KSK), independent of each other. Both authors then met and reviewed every item for agreement. Disagreements were resolved by consensus. Using Cohen's kappa statistic (21), the overall interrater agreement prior to correcting discrepant items was 0.55, which is considered to be moderate (27).
Statistical analysis.
Calculation of effect sizes from each study.
The primary outcomes for this meta-analysis were changes in pain and physical function. Given the different tests and metrics used to assess each of these, the standardized effect size (g), adjusted for small sample bias, was used to allow for the pooling of findings (28). Since all of the studies were parallel trials (29-62), the g for each outcome from each study was calculated as the difference in change scores between the exercise and control groups divided by the pooled SDs of these change scores (28). For studies in which the change outcome SDs for the exercise and control groups were not reported, these were calculated using 95% confidence intervals (95% CIs) or estimated separately for the exercise and control groups using pre-and postintervention means and SDs according to the approach by Follmann et al (63). If only probability values were reported, the g was calculated using the exact probability value (64). If exact probability values were not reported but the result was statistically significant, the g was calculated using the cut point for reporting statistical significance from that study (0.05, 0.01, etc.). If the result was reported as nonsignificant, an alpha value of 1.0 was used to estimate the g. A conservative approach such as this may help to counteract the tendency for nonsignificant results to be reported in less detail than statistically significant results (65).
After calculating g from each study, its variance was estimated using previously developed procedures (28). Reductions in pain were denoted by a negative g, whereas improvements in physical function were denoted by a positive g. For those studies that reported multiple measures to assess pain and physical function in the same group of subjects, results for these measures were pooled.
Pooled estimates of effect sizes for pain and physical function: changes in pain and physical function.
Random-effects meta-regression, method-of-moments, intercept-only models were used to pool pain and physical function outcomes from each study (66). Separate results were generated for pain and physical function and were partitioned according to whether data were analyzed using an ITT or per-protocol approach. Multiple groups from the same study were treated independently, since they represented different types of training. In addition, results were also analyzed by collapsing multiple groups so that only one g represented each outcome (pain and physical function) from each study according to whether data were analyzed using the ITT or per-protocol approach. If the 2-tailed 95% CIs generated from the models did not cross zero, results were considered to be statistically significant. In terms of magnitude, values of g were classified as either trivial (<0.20), small (≥0.20 to <0.50), medium (≥0.50 to <0.80), or large (≥0.80) (67). A g of 0.20, for example, means that exercise would result in a 0.20 SD improvement over those who do not exercise.
Given that the interpretation of standardized effect sizes like g can be difficult in relation to clinical (practical) relevance (68), changes were also calculated for pain and physical function by subtracting the percent change difference in the exercise group from the percent change difference in the control group. Based on previous work, the minimally clinically important difference (MCID) for changes in pain and physical function was set at 10% (69). In addition, the number needed to treat (NNT) and 95% CIs for continuous variables, derived from reported g values in ITT analyses, were calculated (70-72).
Stability and validity of changes in pain and physical function.
Inconsistency of outcomes between studies was examined using an extension of the Q statistic, I2 (73). Generally, I2 values of 25% to <50%, 50% to <75%, and ≥75% are considered to represent small, medium, and large amounts of inconsistency, respectively (73). For this study, the decision rule for heterogeneity was an I2 value greater than 50%.
Publication bias was examined using the regression intercept approach by Egger et al (74). If the 95% CI for the intercept did not include zero, potential publication bias was suspected. In addition, the influence of each study on the overall results was examined by deleting each study from the model once. Cumulative meta-analysis, ranked by year, was used to examine the accumulation of evidence on pain and physical function over time (75).
Covariate analyses for changes in pain and physical function.
In order to examine the effects of potential covariates on changes in pain and physical function, random-effects multiple meta-regression, method-of-moments models were used (66). The inclusion of covariates was limited to those in which complete data were available and potentially related to changes in pain and physical function. Variables in the model included type of rheumatic disease (fibromyalgia, osteoarthritis, rheumatoid arthritis), type of training (aerobic, strength, both), exercise supervision (yes, no, mixed), length of training, and age. Results for individual covariates were considered statistically significant if the 95% CI did not cross zero. Overall model fit for meta-regression was assessed using a chi-square test. In addition, adjusted R2 and I2 statistics were generated. Statistical significance was set at P values less than or equal to 0.05. Results for risk of bias assessments were not incorporated into any of the statistical models because of the large number of items determined to be unclear and/or the small sample sizes available for comparison of studies with a low versus high risk for bias.
Software utilization.
Descriptive statistics were generated using SPSS, version 16.0 (76), while pooled g values for multiple measures on the same outcome in the same subjects were generated using Comprehensive Meta-Analysis, version 2.2 (77). All other meta-analytic analyses were conducted via user-written meta-analytic routines (66, 78-82) available in Stata, version 10.1 (83).
RESULTS
Study characteristics.
As can be seen in Figure 1, a total of 33 studies representing 77 groups (44 exercise, 33 control) and 3,180 men and women (1,857 exercise, 1,323 control) were included in the analysis (29-62). The number of reasons (n = 1,416) for excluding studies exceeds the number of excluded publications because some studies were excluded for more than one reason. The number of publications (n = 34) exceeded the number of studies (n = 33) because 2 studies included the same subjects but had different pain and physical function outcomes (38, 59). Therefore, with the exception of the different pain and physical function outcomes from each of the 2 articles (38, 59), all other variables were counted only once, with reference made to the later publication (59).
Four (80%) of 5 investigators that were contacted for additional information on pain and/or physical function outcomes provided the information requested. The number of subjects in each group from each study ranged from 3 to 153 in the exercise groups (mean ± SD 42 ± 42, median 24) and from 6 to 159 in the control groups (mean ± SD 40 ± 41, median 24). Although 36 publications met the original inclusion criteria, 2 were excluded from the meta-analysis post hoc because they were the only articles of participants with either ankylosing spondylitis (84) or systemic lupus erythematosus (85). No studies that examined the effects of exercise on gout met the inclusion criteria.
With the exception of 1 study, a dissertation (33), all others were published in journals (29-32, 34-37, 39-62). All of the studies were published in the English language (29-37, 39-62). Twelve studies were conducted in the US (29, 32, 41, 43, 45, 46, 48-50, 52, 57, 61); 6 in Canada (31, 37, 44, 53, 55, 62); 2 each in Australia (35, 36), Denmark (40, 54), Finland (39, 60), Norway (34, 47), Spain (42, 59), and the UK (30, 51); and 1 each in New Zealand (33), South Korea (56), and Sweden (58). Eleven studies used some type of attention control group (29-32, 35, 44, 48, 53, 55, 56, 58), while the remaining studies appeared to use usual care or no intervention as the control (33, 34, 36, 37, 39-43, 45-47, 49-52, 54, 57, 59-62). Twelve studies used some type of stratified randomization approach (e.g., sex) when assigning participants to their selected groups (32, 37, 42, 47-53, 61, 62). None of the studies used a crossover design. A general description of each study can be found in Supplementary Appendix B (available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a), while the results for risk of bias assessment are shown in Figure 2 and Supplementary Appendix C (available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a).

Risk of bias assessment.
Based on our assessment procedures, the majority of studies (97%) were considered to be at a low risk of bias with regard to sequence generation. However, approximately two-thirds of the studies (67%) did not provide adequate information to permit a judgment regarding the risk of bias for allocation concealment. For blinding, the majority of studies were considered to be at a high risk for bias (88% for pain and 79% for physical function). This was primarily the result of the difficulties associated with blinding participants in exercise intervention trials. In contrast, more than two-thirds of the studies were considered to be at a low risk of bias with respect to incomplete outcome data (70% for pain and 67% for physical function). Inadequate information, i.e., study protocol number, was provided for the majority of studies (97%) to permit a judgment regarding the risk of bias for outcome reporting. Finally, less than two-thirds of the studies (58%) reported that the participants were sedentary prior to taking part in the study.
Participant characteristics.
A description of the characteristics of participants is shown in Tables 1 and 2. Not surprisingly, the participants consisted primarily, but not solely, of women ages ≥50 years. The exact number of men and women from the groups could not be determined because of missing data for some groups. For those groups in which data were available, drugs that were taken during the study included one or more of the following: nonsteroidal antiinflammatory drugs (NSAIDS), analgesics, antidepressants, anxiolytics, muscle relaxants, and pain medications.
Variable | Exercise | Control | ||||||
---|---|---|---|---|---|---|---|---|
N | Mean ± SD | Range | Median | N | Mean ± SD | Range | Median | |
Age, years | 39 | 57.4 ± 11.2 | 34–73 | 60.0 | 31 | 57.3 ± 10.7 | 34–73 | 59.0 |
Height, cm | 9 | 164.5 ± 1.8 | 161–167 | 165.0 | 8 | 163.8 ± 2.3 | 160–167 | 164.0 |
Body weight, kg | 10 | 78.2 ± 8.4 | 66–92 | 76.2 | 10 | 79.6 ± 9.0 | 69–96 | 77.2 |
BMI, kg/m2 | 12 | 29.0 ± 2.4 | 25–34 | 29.6 | 12 | 29.3 ± 2.4 | 26–34 | 29.6 |
Symptoms, years | 11 | 11.1 ± 4.7 | 8–24 | 9.0 | 10 | 10.9 ± 3.5 | 7–19 | 10.3 |
Diagnosis, years | 18 | 10.0 ± 5.9 | 3–24 | 9.2 | 13 | 9.3 ± 4.9 | 4–23 | 8.8 |
- * N = number of groups reporting data; BMI = body mass index.
Variable | Exercise | Control | ||||
---|---|---|---|---|---|---|
N | Reporting, % | Total, %† | N | Reporting, % | Total, %† | |
Sex | ||||||
Female | 15 | 34.0 | 34.0 | 12 | 34.0 | 34.0 |
Mixed | 29 | 66.0 | 66.0 | 21 | 66.0 | 66.0 |
Race/ethnicity | ||||||
Asian | 1 | 7.1 | 2.3 | 1 | 10.0 | 3.0 |
White | 1 | 7.1 | 2.3 | 1 | 10.0 | 3.0 |
Other Hispanic | 1 | 7.1 | 2.3 | 1 | 10.0 | 3.0 |
Multiple | 11 | 78.6 | 25.0 | 7 | 70.0 | 21.2 |
Menopausal status | ||||||
Premenopausal | 2 | 4.5 | 4.5 | 2 | 6.1 | 6.1 |
Postmenopausal | 18 | 40.9 | 40.9 | 14 | 42.4 | 42.4 |
Both | 24 | 54.5 | 54.5 | 17 | 51.5 | 51.5 |
Sedentary | ||||||
All | 28 | 87.5 | 63.6 | 21 | 91.3 | 63.6 |
Some | 4 | 12.5 | 9.1 | 2 | 8.7 | 6.1 |
Overweight/obese | ||||||
Some | 19 | 95.0 | 43.2 | 15 | 93.8 | 45.5 |
All | 1 | 5.0 | 2.3 | 1 | 6.2 | 3.0 |
Rheumatic disease | ||||||
OA | 18 | 40.9 | 40.9 | 15 | 45.5 | 45.5 |
RA | 11 | 25.0 | 25.0 | 5 | 15.1 | 15.1 |
Fibromyalgia | 13 | 29.5 | 29.5 | 12 | 36.4 | 36.4 |
Drugs for condition | ||||||
All | 7 | 20.6 | 15.9 | 5 | 20.0 | 15.2 |
No | 1 | 2.9 | 2.3 | 1 | 4.0 | 3.0 |
Some | 26 | 76.5 | 59.0 | 19 | 76.0 | 57.6 |
OA or RA | 2 | 4.5 | 4.5 | 1 | 3.0 | 3.0 |
- * Some percentages do not add to exactly 100% because of rounding. N = number of groups reporting data; OA = osteoarthritis; RA = rheumatoid arthritis.
- † Percentage reported in each category while including missing data in the calculation; percentages may not add to 100% because of missing data.
Training program characteristics.
A description of the training program characteristics is shown in Table 3 and Supplementary Appendix B (available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a). We were unable to calculate the average intensity for strengthening exercise as a percentage of 1 repetition maximum because of a lack of data as well as the different methods used to assess and report intensity. Data on the rest period between sets were available for only 3 (9.4%) of 32 groups that participated in strengthening exercise. For those groups, the rest period ranged from 0 to 60 seconds. Data on the location in which exercise took place were available for 40 groups (90.9%). This included facility-based exercise for 27 groups (67.5%), home-based exercise for 7 others (17.5%), and a combination of both for 6 (15%). With regard to the type of participation for the 35 groups (79.5%) in which data were available, 23 (65.7%) reported that exercise took place in a group setting, 6 (17.1%) took place by one's self, and another 6 (17.1%) took place via a combination of the two. No change in physical activity outside the exercise intervention was reported for 2 exercise groups, whereas one control group each had either no change in physical activity or increased their physical activity.
Variable | N | Mean ± SD | Range | Median (IQR) |
---|---|---|---|---|
Aerobic only | ||||
Length, weeks | 12 | 19.1 ± 19.1 | 6–78 | 13.0 (7.0) |
Frequency, times per week | 12 | 3.3 ± 1.3 | 2–7 | 3.0 (0) |
Intensity, % VO2max | 5 | 55.1 ± 4.8 | 52–60 | 51.6 (8.8) |
Duration, minutes per session | 9 | 28.0 ± 14.9 | 12–60 | 24.5 (20) |
Compliance, %† | 3 | 75.0 ± 13.0 | 67–90 | 68.0 (‡) |
Strength only | ||||
Length, weeks | 12 | 19.6 ± 19.4 | 6–78 | 14.0 (13.0) |
Frequency, times per week | 12 | 2.8 ± 0.8 | 2–5 | 3.0 (1.0) |
Intensity, % 1 RM | 0 | ‡ | ‡ | ‡ |
Sets, no. | 5 | 2.8 ± 1.9 | 1–6 | 2.0 (3.0) |
Reps, no. | 3 | 11.0 ± 9.5 | 1–20 | 12 (‡) |
Exercises, no. | 9 | 6.8 ± 2.9 | 1–11 | 7.0 (4.0) |
Compliance, %† | 6 | 80.7 ± 9.3 | 70–97 | 79.2 (14.0) |
Aerobic and strength training | ||||
Aerobic | ||||
Length, weeks | 20 | 34.7 ± 39.0 | 6–104 | 12.0 (57.0) |
Frequency, times per week | 18 | 2.8 ± 0.7 | 2–5 | 3.0 (1.0) |
Intensity, % VO2max | 2 | ‡ | ‡ | ‡ |
Duration, minutes per session | 11 | 35.5 ± 16.7 | 10–60 | 30.0 (26) |
Compliance, % | 10 | 77.1 ± 13.5 | 50–90 | 82.5 (21) |
Strengthening | ||||
Length, weeks | 20 | 34.7 ± 39.0 | 6–104 | 12.0 (57.0) |
Frequency, times per week | 14 | 2.7 ± 0.8 | 2–5 | 3.0 (1.0) |
Intensity, % 1 RM | 0 | ‡ | ‡ | ‡ |
Sets, no. | 3 | 2.3 ± 1.5 | 1–4 | 2.0 (‡) |
Reps, no. | 3 | 12.3 ± 2.5 | 10–15 | 12 (‡) |
Exercises, no. | 5 | 15.8 ± 14.7 | 2–37 | 12 (28) |
Compliance, %† | 9 | 78.1 ± 13.8 | 50–90 | 83.3 (18.0) |
- * N = number of groups reporting data; IQR = interquartile range; VO2max = maximum oxygen consumption; RM = repetition maximum.
- † Exercise sessions attended.
- ‡ Unable to calculate.
Assessment of pain and physical function.
For the 31 studies that included pain measures and met the inclusion criteria, 17 (54.8%) used a verbal rating scale (29, 30, 35, 38, 41, 43, 46, 48-51, 53, 56-59, 62), 12 (38.7%) used a visual analog scale (31, 34, 38-40, 42, 46, 51, 52, 55, 60, 61), and 2 (6.5%) used a numerical rating scale (32, 54). For the 26 studies that assessed physical function, 13 (50.0%) used both self-report and performance-based measures (29, 30, 32, 35, 36, 38, 41, 45, 46, 48, 53, 58, 59), 8 (31.0%) used self-reported measures only (39, 40, 50-52, 55-57), while the remaining 5 (19.0%) were limited to performance-based measures (37, 44, 49, 54, 61).
Pain and physical function results.
Pain.
Overall results for pain are shown in Figure 3. A total of 28 effect sizes were available for per-protocol analyses and 17 for ITT. Small but statistically significant reductions in pain occurred for both ITT and per-protocol analyses. Relative decreases exceeded the MCID of 10% for pain based on per-protocol results (−18%) and was close to the cut point based on ITT analyses (−9%). The NNT was 9 (95% CI 5, 25). Excessive heterogeneity was identified for the ITT analysis (I2 = 65.4%) but not the per-protocol analysis (I2 = 33.1%). No statistically significant publication bias was observed for either per-protocol or ITT analyses (per-protocol 95% CI −0.61, 2.47; ITT 95% CI −1.68, 4.02). With each group deleted from the model once, the pain results remained statistically significant for both ITT and per-protocol approaches, ranging from a g of −0.43 to −0.35 for per-protocol results and from −0.24 to −0.18 for ITT. Cumulative meta-analysis, ranked by year, demonstrated that the results have been statistically significant since 1999 when the per-protocol approach was used, and since 1997 for the ITT approach (Figure 4). Multiple meta-regression results for pain were not statistically significant.

Forest plot for standardized effect size changes (Hedge's g) in self-reported pain according to intent-to-treat and per-protocol analyses. 95% CI = 95% confidence interval.

Cumulative meta-analysis, ranked by year, for standardized effect size changes (Hedge's g) in self-reported pain according to intent-to-treat and per-protocol analyses. 95% CI = 95% confidence interval.
When overall results for pain were analyzed at the study level (one g from each study), a total of 21 outcomes were available for per-protocol analyses and 12 for ITT. Similar to analyzing multiple groups independently, the results were small but statistically significant (per-protocol g = −0.45 [95% CI −0.62, −0.28], I2 = 31.4%; ITT g = −0.22 [95% CI −0.35, −0.09], I2 = 44.0%).
Physical function.
Overall results for physical function are shown in Figure 5. A total of 24 effect sizes were available for per-protocol analyses and 19 for ITT. Small but statistically significant improvements in physical function occurred for both types of analyses. Relative decreases met or exceeded the MCID of 10% for physical function based on per-protocol (15%) and ITT (10%) analyses. The NNT was 5 (95% CI 4, 7). No excessive heterogeneity was identified for either approach (per-protocol I2 = 19.5%, ITT I2 = 19.1%). No statistically significant publication bias was observed for either per-protocol or ITT outcomes (per-protocol 95% CI −1.55, 1.34; ITT 95% CI −1.74, 1.52). With each group deleted from the model once, the results remained statistically significant for both ITT and per-protocol approaches, ranging from a g of 0.34 to 0.39 for per-protocol results and from 0.31 to 0.36 for ITT. Cumulative meta-analysis, ranked by year, demonstrated that improvements in physical function have been statistically significant since 2001 when the per-protocol approach was used, and since 1997 for the ITT approach (Figure 6). Multiple meta-regression results for physical function were not statistically significant.

Forest plot for standardized effect size changes (Hedge's g) in physical function according to intent-to-treat and per-protocol analyses. 95% CI = 95% confidence interval.

Cumulative meta-analysis, ranked by year, for standardized effect size changes (Hedge's g) in physical function according to intent-to-treat and per-protocol analyses. 95% CI = 95% confidence interval.
When overall results were analyzed at the study level (one g from each study), a total of 17 outcomes were available for per-protocol analyses and 14 for ITT. Similar to analyzing multiple groups independently, small but statistically significant improvements were observed for physical function (per-protocol g = 0.42 [95% CI 0.23, 0.61], I2 = 31.9%; ITT g = 0.33 [95% CI 0.22, 0.43], I2 = 14.3%).
DISCUSSION
The primary purpose of meta-analysis is to reach general conclusions about a body of research (64). To the best of the investigative team's knowledge, this is the first meta-analysis to examine, from a public health perspective, the effects of community-deliverable exercise on pain and physical function in adults with AORD (osteoarthritis, rheumatoid arthritis, and fibromyalgia) using separate results from both per-protocol and ITT analysis. The overall results suggest that community-deliverable exercise improves pain and physical function in adults with these conditions. These findings are strengthened by 1) the lack of observed publication bias, 2) the stability of the results when each study was deleted from the model once, 3) the significance of the results over approximately the past decade, and 4) the similar results obtained when analyzed at both the study and group level. With the exception of pain results using ITT outcome data, results also displayed acceptable levels of consistency. The findings of this meta-analysis are also in general agreement with the findings of previous meta-analyses dealing with the effects of exercise on pain and/or physical function in adults with specific types of AORD, which included mixed clinical and community-based populations (6-12). In addition, the fact that both ITT and per-protocol analyses yielded statistically significant improvements in pain and physical function in participants with AORD suggests that not only does exercise work (per-protocol analysis), but it also works in the real-world setting (ITT analysis). This latter finding provides support for the investment of resources for community-based public health exercise programs aimed at improving pain and physical function in adults with AORD.
Given that the MCID for pain and physical function exceeded or approached the 10% cut point, the results of the current meta-analysis are considered to be clinically important. The former notwithstanding, it is important to understand that no clear consensus exists in relation to what constitutes the MCID, as recommended numbers vary widely (69, 86-89). When viewed from an absolute perspective, the number of adults with AORD who might benefit from exercise appears to be noteworthy. For example, based on the NNT for pain and physical function, the current prevalence of self-reported doctor-diagnosed arthritis in the US (46 million) (2) and assuming that adults with AORD reached the Healthy People 2010 goal for participation in regular physical activity (30%), more than 153,000 US adults with AORD would reduce their pain, while approximately 276,000 would improve their physical function.
The standardized effect size improvements for pain and physical function observed in the current meta-analysis are similar to those that might be considered clinically important for analgesic agents in patients with osteoarthritis (pain: 0.21 for acetaminophen [90], 0.32 for NSAIDs [91], and 0.41 for topical NSAIDs [92]; function: 0.36 for topical NSAIDs [92]). However, given the potential side effects and costs associated with some analgesic agents, exercise may be preferred over the use of medications for some patients. Also, the authors are not aware of any information on the additive effects of various analgesic interventions (e.g., combining analgesic medication with other nonpharmacologic interventions such as exercise) in AORD.
The effects of exercise found in the current meta-analysis are also similar to a standardized effect size improvement of 0.34 for function using transcutaneous electrical nerve stimulation (TENS) in participants with knee osteoarthritis (93). In contrast, TENS resulted in a standardized effect size improvement that was greater for pain (0.85) (93). The findings of the current exercise meta-analysis are also less than the recently reported standardized effect size improvements of 0.96 for pain and 0.93 for function as a result of acupuncture in participants with knee osteoarthritis (94). However, both of these interventions would probably be difficult to implement in a public health setting. Nevertheless, the findings that both TENS and acupuncture have greater effect sizes than exercise for pain and that acupuncture has a greater effect size for physical function have implications for clinicians' recommendations for patients with AORD who have severe pain or whose physical functioning is significantly limited.
Exercise is safe for persons with AORD, as the rates of adverse events have generally been low (6, 7). Exercise may also be particularly appealing given the numerous other benefits that can be derived from such (95). For example, using standardized mortality ratios, a recent meta-analysis reported that rheumatoid arthritis was associated with a 60% increase in the risk of cardiovascular mortality compared to the general population (96). Since increased physical activity, especially aerobic activity, has been shown to reduce mortality from cardiovascular disease (97) and the prevalence of physical activity is low among those with AORD (98, 99), it would seem appropriate to suggest that exercise be especially encouraged because of the benefits mentioned above and beyond those derived for pain and physical function.
The multiple meta–regression analyses conducted in the current investigation resulted in no statistically significant associations between potential predictors and changes in pain or physical function. These findings suggest that changes in pain and physical function occur irrespective of 1) age, 2) type of rheumatic disease, 3) type of training, 4) whether exercise was supervised or not, and 5) number of weeks of training. One of the possible reasons for the lack of association may have been the large number of covariates given the number of studies included. In addition, given that studies are not randomly assigned to predictors, meta–regression analyses with predictors are considered to be observational in nature (100). Consequently, such analyses do not support causal inferences (100). Rather, the validity of these findings would need to be tested in large, well-designed randomized controlled trials.
One of the major reasons for conducting a meta-analysis is to provide direction for future research, including the reporting of such research, based on a quantitative approach. With regard to future research, the assessment of initial and final physical activity levels outside of the exercise intervention was largely absent. Such knowledge is important, since changes in physical activity outside of the intervention could have an effect on changes in the observed outcomes. For example, an increase in physical activity in the control groups could reduce or eliminate the benefits observed in the exercise groups. Alternatively, an increase in physical activity in the exercise groups could overstate the benefits of a prescribed exercise program. Future studies need to use a valid and reliable instrument to assess physical activity before and after the intervention period. Finally, while it may appear from our risk of bias assessment (Figure 2 and Supplementary Appendix C [available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a]) that a need exists for better blinding procedures given the high risk of bias observed for more than two-thirds of the included studies, it is important to understand that it is extremely difficult to blind participants in exercise intervention studies. Consequently, this high risk is not necessarily the result of poor study design, but rather the inherent nature of exercise intervention studies.
Only one study each for ankylosing spondylitis (84) and systemic lupus erythematosus (85) were included, while no studies on gout met the criteria. While one might expect exercise-induced improvements in pain and physical function to be similar to those for osteoarthritis, rheumatoid arthritis, and fibromyalgia, this still needs to be tested in well-designed randomized controlled trials. This is especially true given that the collective prevalence of ankylosing spondylitis (n = 350,000) (101, 102), systemic lupus erythematosus (n = ∼300,000) (103), and gout (n = 3 million) (104) totals more than 3.5 million people in the US.
The reporting of research in future studies could be improved. For example, because of missing data, the modeling of such in the current study was limited to the length and type of training. Future studies need to provide complete information on not only the length and type of training, but also on the average frequency, intensity, and duration of the exercise intervention. In addition, complete data on compliance to the exercise protocol need to be reported. For those studies that include strengthening exercises, complete data should also be provided on the average number of exercises, sets, and repetitions, and type of resistance, as well as the rest period between each set. The issue of dose response is important given the recent recommendations from the Physical Activity Guidelines for Americans calling for additional dose-response analyses (20).
A lack of complete information was also provided on the characteristics of the participants. These included height, body weight, number of years in which symptoms were present, numbers of years since diagnosis, and whether they had participated in a regular program of physical activity prior to enrollment. In addition, complete information on race/ethnicity needs to be reported as well as medication usage and how that may have changed over the course of the study. Furthermore, there was a general lack of reporting in relation to the specific number of participants with comorbid conditions. Since this may be a factor with regard to exercise programming, future research should report this information in order to inform practice. Finally, better reporting of study design features is needed. Based on our risk of bias assessment shown in Figure 2 and Supplementary Appendix C (available in the online version of this article at http://bibliotheek.ehb.be:2356/journal/10.1002/(ISSN)1529-0131a), this includes a clear description of allocation concealment as well as information on the original study protocol, e.g., study protocol number, to ensure that all outcomes that were planned to be assessed are accounted for in the article. Information on the number of participants who dropped out of each group as well as the reasons, by group, for this attrition is also needed.
The results of the present meta-analysis suggest that community-deliverable exercise improves pain and physical function in adults with AORD (osteoarthritis, rheumatoid arthritis, and fibromyalgia). These findings are important because they support the notion that exercise programs delivered in the community do not need to be disease specific or tailored to a specific type of arthritis. In addition, community-deliverable programs do not require highly trained and/or educated instructors, a major barrier facing implementation by community-based organizations. Furthermore, community-deliverable exercise interventions are critical to promoting widespread dissemination through a public health approach and therefore, should be encouraged. Finally, given the importance of determining the dose-response effects of exercise, future research needs to provide complete data on the exercise intervention(s) utilized as well as determining the effects of exercise in those with less common types of rheumatic disease (lupus, gout, and ankylosing spondylitis).
AUTHOR CONTRIBUTIONS
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. George A. Kelley had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. George A. Kelley, Kristi S. Kelley, Hootman, Jones.
Acquisition of data. George A. Kelley, Kristi S. Kelley, Hootman, Jones.
Analysis and interpretation of data. George A. Kelley, Kristi S. Kelley, Hootman, Jones.