Full Access

Generalized Linear Models in Family Studies

Corresponding Author

Zheng WU

University of Victoria

Department of Sociology, University of Victoria, P.O. Box 3050, Victoria, British Columbia, V8W 3P5 Canada (zhengwu@uvic.ca).Search for more papers by this author

Zheng WU,

Corresponding Author

Zheng WU

University of Victoria

Department of Sociology, University of Victoria, P.O. Box 3050, Victoria, British Columbia, V8W 3P5 Canada (zhengwu@uvic.ca).Search for more papers by this author

First published: 20 September 2005

https://bibliotheek.ehb.be:2102/10.1111/j.1741-3737.2005.00192.x

Citations: 18

Share a link

Email
Facebook
x
LinkedIn
Reddit
Wechat

Abstract

Generalized linear models (GLMs), as defined by J. A. Nelder and R. W. M. Wedderburn (1972), unify a class of regression models for categorical, discrete, and continuous response variables. As an extension of classical linear models, GLMs provide a common body of theory and methodology for some seemingly unrelated models and procedures, such as the logistic, Poisson, and probit models, that are increasingly used in family studies. This article provides an overview of the principle and the key components of GLMs, such as the exponential family of distributions, the linear predictor, and the link function. To illustrate the application of GLMs, this article uses Canadian national survey data to build an example focusing on the number of close friends among older adults. The article concludes with a discussion of the strengths and weaknesses of GLMs.

The classical linear regression model is the most widely employed statistical method in family research. The Journal of Marriage and Family has published hundreds of applications of linear models (LMs) over the past several decades. LMs also form the main content in undergraduate statistical methods courses across the social sciences. LMs are popular because they are simple to construct and interpret, and because they describe the relationship between the expectation of a response (dependent) variable and a set of explanatory (independent) variables very clearly. The properties of LMs, moreover, are thoroughly understood, and the least squares method, which dates from Gauss in the early 19th century, provides a nice analytical solution for estimating regression coefficients. The least squares estimator has several desirable statistical properties, including being the most efficient estimator of all unbiased estimators (the Gauss-Markov Theorem), and is therefore termed the “best linear unbiased estimator” or BLUE (e.g., Draper & Smith, 1981).

LMs are based on the assumption that response variables (Y) are continuous and have conditional normal (Gaussian) distributions (given X) with constant (error) variance, and that the relationship between the response variable and the explanatory variables (or their linear transformations) is linear. In family studies, however, there are certain situations where these assumptions cannot be realized. For example, this situation occurs when the response variable of interest is bounded, as in the case of the number of children a woman has (y≥ 0), or when the response variable is categorical, as in investigations of first marriage (y= 1 if marries, y= 0 if remains single). LMs are inappropriate in these circumstances because the expectation of the response variable does not always remain in the possible range without enforcing unnatural constraints on the regression coefficients, thus violating the linearity assumption. In addition, when y is bounded or categorical, the variance of y depends on the mean, and the normality assumption breaks down because the response distributions (e.g., gamma and binomial) are substantially different from normal distributions with constant variance.

One solution to these problems involves using generalized linear models (GLMs). GLMs accommodate response variables with nonnormal (conditional) distributions through a transformation called the link function. Having a common theoretical framework, GLMs represent a class of statistical models, including classical LMs for continuous data, logistic and probit models for binary or binomial data, Poisson and negative binomial models for count data, and log-linear models for multinomial data as special cases. These models share a number of properties (e.g., a common structure and a common method for parameter estimation), and their response (probability) distributions belong to the same exponential family of distributions (Nelder & Wedderburn, 1972). By tapping these similarities, GLMs unify these different models under a generalized framework, meaning that one can learn GLMs as a single class rather than as several seemingly unrelated models (McCullagh & Nelder, 1989). These models can be easily formulated by selecting an appropriate response (probability) distribution and link function. In this respect, GLMs provide social scientists a simplified and flexible approach to statistical modeling. Several commercial statistical software packages now include a routine for standard GLMs.

This article's objective is to introduce the principle of GLMs and present an empirical application. I had two JMF audiences in mind as I wrote this article. Family researchers who regularly utilize LMs but who have limited experience using nonlinear models (e.g., Poisson and logistic models) may benefit from using GLMs because this modeling technique offers a generalized approach to model formulation and thus eliminates the need to learn separate models for different response distributions. Other family scholars who have substantial knowledge of selected nonlinear models, such as logistic and Poisson models, may find that GLMs provide a nice alternative because they are flexible and easy to implement. My intention, however, is not to describe the technical details of GLMs (see Dobson, 2002; McCullagh & Nelder, 1989) but to provide a concise introduction to the concept and a practical application of GLMs in family research.

I begin with a brief overview of LMs. Then I describe the key components of GLMs, including the formal structure of GLMs, the exponential family of distributions, and the link function. I then provide a GLM example using SAS GENMOD procedure. In the application, my focus is on how to apply GLMs, covering topics on model selection, link functions, and assessments of goodness of fit. I conclude with a discussion on the advantages and pitfalls of GLMs.

Generalized linear models

The earliest attempts to model categorical and limited (e.g., censored, truncated, ordinal, and count) response variables include Fisher's (1922) work on dilution assays, Bliss's (1935) probit models for quantal responses, Berkson's (1944) logistic models for bioassay experiments, Dyke and Patterson's (1952) extension of logistic models for proportions, and developments in Poisson log-linear and multinomial models for count data in the 1960s (see McCullagh & Nelder, 1989, pp. 8–17). Nelder and Wedderburn (1972) were the first to identify a wider class of GLMs, which include the above models as special cases and provide a unified theory and procedure for fitting them on the basis of maximum likelihood (ML) theory. The comprehensive treatise to GLMs by McCullagh and Nelder and abbreviated introductions by Dobson (2002) and Myers, Montgomery, and Vining (2002) increased the popularity of GLM techniques in industry, science, and engineering.

Linear models

I begin with a brief review of LMs because the class of GLMs is an extension of LMs. The classical LM takes the form

(1)

where the random (dependent) variable Y_i is independent normal variable, with mean μ_i and constant variance σ²; x_ij is the value of the jth explanatory variable for the ith observation; and β_j is the unknown parameter associated with the jth explanatory variable. In matrix notation, we rewrite Equation 1 as

(2)

where μ is n× 1, X is n×p, β is p× 1, and p is the number of unknown parameters including the intercept. Following McCullagh and Nelder's (1989) terminology, E(Y) forms the random part, whereas Xβ comprises the systematic part of the model or simply the linear predictor.

The structure of GLMs

GLMs extend LMs by relaxing the two LM assumptions noted in the introduction. First, the response variables are no longer required to be continuous and to have normal distributions and thus can be categorical or limited (e.g., truncated and counts). Second, the relationship between the response and explanatory variables in Equation 2 can be nonlinear. We achieve the first extension of LMs by recognizing that many desirable properties of the normal distribution are shared by a wider class of distributions known as the exponential family of distributions. The second extension results from the development of some nonlinear link functions relating the random component, μ in Equation 2, to the linear predictor, Xβ (McCullagh & Nelder, 1989).

The GLM takes the form

(3)

where g(·) is the link function. The similarities in the structure form between LMs and GLMs are striking: With the exception of the link function, GLMs are based on the same structure as LMs. Specifically, GLMs have three basic components (McCullagh & Nelder, 1989, p. 27):

1
The random component: Y is the random part of the model, with E(Y) =μ. The components of Y have independent distributions that belong to a member of the exponential family. That is, the distributions of response variables can be nonnormal, such as binominal or Poisson distributions. μ is the mean of the response variables.
2
The systematic component: Xβ is the systematic part of the model, which produces a linear predictorη given by

Why is it termed linear predictor? Like LMs, η is a linear combination of explanatory variable X and the corresponding parameter β. This partly explains why these models are called generalized linear models.
3
Link function: The random component and the systematic component of the model are connected through the link function

g(·) is a monotonic and differentiable function, linking the mean and the linear predictor. This means that the linear predictor Xβ affects the response variable Y through some functional form of g(·). The link function is invertible, and its inverse is sometimes called the mean function:

To see how these components are connected, I have the example of the logistic regression model. The logistic model takes the form

where π_i=P(Y_i= 1), 1−π_i=P(Y_i= 0). We can rewrite this model as a GLM:

In this model, π_i (the random part of the model) is the mean of the response variable that follows the binomial distribution. η_i=β₀+β₁x_i₁+β₂x_i₂+⋯+β_px_ip (the linear predictor) is a linear combination of explanatory variables and their associated unknown parameters. η_i= log(π_i/1 −π_i) is the link function, known as the logit link.

Similarly, we can rewrite the LM in Equation 1 in the form of a GLM

This suggests g(μ_i) =μ_i, a special case of GLM in which the identity link function is used.

To summarize the key ideas presented in this section, a GLM comprises three components: (a) the response distribution, (b) the linear predictor, and (c) the link function. The basic idea of GLMs is to connect the response distribution (the random component) to the linear predictor (the systematic component) through a link function. By using this link function, GLMs can accommodate a variety of nonlinear response distributions and map the linear predictor onto the range of these distributions. At the same time, the virtue of “linearity” in LMs remains in GLMs. Now, what is the exponential family of distributions? When can GLMs be applied?

The exponential family of distributions

In their seminal article on GLMs, Nelder and Wedderburn (1972) demonstrate that many well known distributions, such as the binomial, Poisson, and normal distributions, can be reparameterized into a single, unified exponential formula that is useful to show their theoretical similarities and differences. To put it differently, the exponential family form can be seen as a method that moves all of the terms in the probability density functions that belong to the family to the exponent to conform to a common structure (Gill, 2001, p. 8). The importance of this discovery is to show that many seemingly disparate probability distributions are really examples of a more general mathematical form known as the exponential family. Appendix A provides the functional form of the exponential family of distributions, and an example of the exponential family.

GLMs require this reparameterization to demonstrate that familiar response distributions, such as normal and binomial distributions, belong to the exponential family of distributions. In this sense, the exponential family of distribution is a unifying concept, which brings together the commonly used probability distributions under a generalized framework. Table 1 shows selected distributions that belong to this family. These include normal, binomial, Poisson, negative binomial, gamma, and inverse Gaussian distributions. A well known exception is Weibull distribution, which is a common distribution for survival (event history) data. Another exception is the censored normal distribution, which invokes the use of the tobit model (Greene, 2003).

Table 1. Characteristics of Selected Distributions in the Exponential Family

Distribution	Notation	Range of y	Canonical Link	Noncanonical Link
Normal	N(μ_i, σ²)	(−∞, ∞)	η_i=μ_i
Binomial	B(n_i, μ_i)	(0, 1, 2, …, n_i)/n_i	η_i= log(μ_i/1 −μ_i)	η_i=Φ⁻¹(μ_i) (probit link)
Binomial	B(n_i, μ_i)	(0, 1, 2, …, n_i)/n_i	η_i= log(μ_i/1 −μ_i)	η_i= log(−log(1 −μ_i)) (complementary log-log link)
Poisson	P(μ_i)	0, 1, 2, …, ∞	η_i= log(μ_i)
Negative binomial	NB(μ_i, p)	0, 1, 2, …, ∞	η_i= log(1 −μ_i)
Gamma	G(μ_i, v)	(0, ∞)	η_i=μ_i⁻¹
Inverse Gaussian	IG(μ_i, σ²)	(0, ∞)	η_i=μ_i⁻²

Note: In each case, μ_i is the mean of the dependent variable in the ith observation. In the binomial case, n_i is the number of binomial trials. In the probit link, Φ is the standard normal cumulative distribution function.

Now that I have discussed the functional form of the exponential function, I examine the two basic characteristics of the function, that is, the mean and the variance, which provide necessary information for statistical inference. From standard theory for the exponential family (McCullagh & Nelder, 1989, pp. 28–29), the mean and variance for one observed response y are given by

where primes denote derivatives with respect to θ. Clearly, var(y) is the product of two functions: (a) a(φ), which depends on φ only, and (b) b″(θ), which depends on the parameter θ and therefore the mean μ. This means that the variance is a function of the mean for the exponential family of distributions. In the case of the binomial distribution, the variance function is given by

which shows that the variance is expressed as a function of the mean of the distribution.

I have noted that GLMs are a class of models where response distributions belong to the exponential family of distributions. This means that GLMs can always be applied when individual responses (observations) belong to the exponential family of distributions. Indeed, recent developments in statistical theory have demonstrated that even when response distributions are not members of the exponential family, the framework of GLMs can still be applied, provided that the variance can be described as a function of the mean of the distribution (Carroll & Ruppert, 1988; McCullagh & Nelder, 1989). As described below, most family-related variables can be modeled in the framework of GLMs.

Link functions

I noted that the linear predictor is the second component of GLMs. In family research, the choice of the linear predictor or the selection of explanatory variables, to a large degree, depends on substantive theories and data availability. Next, I examine the link function, which connects the random component to its linear predictor.

There are many link functions available for GLMs, but each distribution generally has a specific link function necessary for obtaining desirable statistical properties (sufficient statistics) (McCullagh & Nelder, 1989). If one chooses θ_i=η_i as illustrated in Equation A1, then the link is called a canonical link, also known as the standard link. In other words, canonical links occur when θ_i (the canonical parameter) equals η_i (the linear predictor). For example, in a logistic regression model with binary data, θ_i=η_i= log(π_i/1 −π_i), where π_i=P(Y_i= 1), is a canonical link. This is why a binomial model with the canonical (logit) link, η_i= log(π_i/1−π_i), is also called a logit (logistic) model. Similarly, a binomial model with the Gaussian (probit) link is known as a probit model. Table 1 shows the canonical link functions for the selected distributions and two commonly used noncanonical links associated with the binomial distribution: the probit and the complementary log-log link. The probit model is popular among economists for modeling binary data. The complementary log-log model is useful for modeling continuous-time (duration) data.

Choosing the appropriate link function depends on several factors. All things being equal, canonical links are preferable because they lead to desirable statistical properties of the model particularly when the sample size is small (McCullagh & Nelder, 1989). For example, when we model binary data (e.g., y= 1 if works outside the home, y= 0 if otherwise), it is recommended to first fit a binomial model with the canonical link, that is, the logistic regression model. Indeed, logistic regression has become the standard statistical tool for binary data in family research. Odds ratios, derived from a simple (antilog) transformation on π_i, are now standard interpretation of categorical covariates in the logistic regression. Similarly, when we fit a Poisson model, the log link, which is the canonical link for the Poisson distribution, should be considered not only because it produces nice statistical properties but also because it is simple to interpret. Although it is convenient and desirable to choose canonical links, however, convenience should not substitute the quality of model fit as the model (link function) selection criterion (McCullagh & Nelder, 1989). The choice of the link, and thus the model, should depend on how well it fits the data (a topic discussed below). A better model fit should always be the goal when choosing the link function.

An application of GLMs

The previous section outlined the structure of GLMs and discussed situations in which GLMs can be applied in statistical analysis. The following section illustrates this discussion by presenting an application of the GLMs method using a family studies example. Although my focus is on how to use GLMs, I also discuss several important concepts and problems that researchers might encounter when using GLMs.

The example application focuses on the number of close friends among Canadians aged 65 years and older. The data are from the Canadian General Social Survey (GSS) on Family and Friends conducted by Statistics Canada between January and March 1990 (Statistics Canada, 1991). The GSS-1990 used a national probability sample of 13,495 Canadian adults aged 15 years and older, excluding residents of the Yukon and Northwest territories and full-time institutionalized residents. The overall response rate is 73%. Removing the missing cases for the dependent variable, the study sample totals 1,907 older adults. The survey identified close friendships by asking respondents, “Other than your immediate family, how many people do you consider close friends?” The number of close friends ranges from 0 to 96, with about 15% of respondents having more than 10 close friends.

Friendship networks are an important source of informal social support for older persons (Wu & Pollard, 1998). To model the number of close friends, I consider several explanatory variables, including marital status, gender, age, children, siblings, living alone, income, education, health, and church attendance. Table 2 presents the definitions and descriptive statistics for these variables. I consider marital status because prior research indicates that being married generates opportunities for building friendships (Logan & Spitze, 1994). Gender is another important variable predicting friendship opportunities (Campbell & Lee, 1990). I consider age because older persons may have fewer opportunities to establish new friendships and may lose more friends to death. I consider living alone, number of children, and number of siblings because family status and household size may influence the demand for nonfamilial social contacts. I consider income and education because socioeconomic status influences social network size (Campbell & Lee, 1992). The literature also indicates that church membership is an influential social network determinant among older adults (Felton & Berry, 1992).

Table 2. Definitions and Descriptive Statistics for Variables Used in the Application: Canadians Aged 65 and Older, the 1990 General Social Survey, Statistics Canada

Variable	Definition	M or %	SD
Close friends	Number (range: 0–96)	8.62	13.84
Marital status
Separated/divorced	Dummy variable (1 =yes, 0 =no)	5.69	—
Widowed	Dummy variable (1 =yes, 0 =no)	29.32	—
Never married	Dummy variable (1 =yes, 0 =no)	5.92	—
Married/cohabiting	Reference group	59.08	—
Female	Dummy variable (1 =female, 0 =male)	54.61	—
Age	In years (range: 65–80)	71.78	4.98
Children	Number (range: 0–13)	3.23	2.64
Siblings	Number (range: 0 to 10+)	5.24	3.13
Living alone	Dummy variable (1 =yes, 0 =no)	30.87	—
Income	Personal income in 20 levels (1 =none, …, 20 =$80,000 or more)	5.35	2.97
Education	Education attainment in 11 levels (1 =no formal education, …, 12 =master's or more)	5.39	2.66
Health	Self-reported health status (1 =poor, …, 4 =excellent)	2.95	0.84
Church attendance	In 5 levels (1 =never, …, 5 =once a week or more)	3.41	1.67
N		1, 907	—

Note: Weighted means or percentages, unweighted N.

An application of GLMs follows three basic steps. Step 1 is model selection, which involves identifying a response distribution for the data, selecting an appropriate link function, and determining the linear predictor. Step 2 involves estimating the parameters seating in the liner predictor and then computing variances and conducting significance tests for the parameter estimates. Step 3 is an evaluation of model fit using measures of goodness of fit and other model fitting tools such as residual analysis. I discuss each of these steps in detail.

Model selection

I noted that the GLM comprises three components: the random component, the systematic component (the linear predictor), and the link function. The model selection involves three steps corresponding to these components: (a) identifying the response distribution, (b) choosing an appropriate link function, and (c) selecting the explanatory variables. Because the selection of the explanatory variables for the example is based on the literature of social networks and data availability, I focus on the first two steps in model selection.

In GLMs, the response distribution usually determines the choice of link function and therefore the form of the likelihood function, which is used in parameter estimation. Thus, the first task is to identify the distribution that describes the dependent variable. The decision generally depends on the characteristics of the data and substantive knowledge of the data. For example, when the dependent variable is continuous, as in the case of yearly family savings, we may use the normal distribution because the amount of savings can be positive and negative (debts). If the dependent variable is binary, however, as in the case of marital breakdown, we should consider the Bernoulli distribution because the range of the distribution is 0 to 1. Similarly, when the dependent variable is the annual number of divorces granted in each county, then we may use the binomial distribution. Table 1 provides the information on the range of the selected members of the exponential family. There are other members of the family not shown in the table, such as the multinomial distribution used for modeling categorical data.

Although the range of the distribution and the knowledge of the data provide clues for identifying the form of distribution, other factors may affect the decision on model selection. For example, when the dependent variable is continuous nonnegative, as in the case of leisure time that family members spend together each week, we may consider the exponential, gamma, or inverse-Gaussian distributions. In this case, the decision may be empirically informed through observing which model fits the data better. In short, when in doubt it is always advisable to examine and compare the overall fit of the model under different distribution assumptions.

Turning to the link function, I have noted that the range of the members of the exponential family varies, from 0 to 1 in the Bernoulli distribution to (−∞, ∞) in the normal distribution. Unlike the response distribution, GLMs impose no constraints on the value of the explanatory variable. That is, we assume that the range of the explanatory variable is (−∞, ∞). The purpose of the link function is to map (match) the linear predictor onto (with) the range of the response distribution. For example, for the Bernoulli distribution, the link function transforms the value of the explanatory variable from a range of (−∞, ∞) to a range of (0, 1). As discussed, there is at least one specific link function associated with a particular distribution. Table 1 also shows selected link functions associated with the common distributions in GLMs.

The concept of link functions should not be confused with the practice of transformation of the response variable. When the assumption of normality is seriously violated in the LM, a common approach is to transform the data. The LM is then used to fit the transformed data. Data transformation is also used to stabilize the variance when the variance is a function of the mean (the problem of heteroscedasticity). Although transformation of the response variable may achieve these goals in some cases, it does not work well when the response distribution is nonnormal (Myers et al., 2002). In this regard, the GLM provides a useful alternative because it does not require normality. Constant variance also is a nonissue because the natural variance of the response distribution is incorporated in the GLM modeling (Myers et al., 2002).

Now, the example. The dependent variable is the number of close friends. It is a discrete count variable and contains only nonnegative integers, 0, 1, 2, …, 96. Given these characteristics, it is natural to consider the Poisson distribution because the response distribution fits into the range of the Poisson distribution and the Poisson model is appropriate for count data. The decision to use a Poisson model is, however, explicitly based on the assumption that the predicted mean equals the observed variance of the distribution. This assumption is sometimes not realistic because Poisson regression is often affected by overdispersion (when the observed variance exceeds the predicted mean).

An alternative approach to modeling overdispersed data is to use the negative binomial distribution, another member of the exponential family. As shown in Table 1, the negative binomial distribution has the same sample space (range) as the Poisson distribution, but it has an additional (gamma distributed) parameter that can be used to model a variance function. Poisson-distributed mean and gamma-distributed variance naturally produce the negative binomial distribution (Gill, 2001). Thus, at this stage of the analysis, I am inclined to consider the Poisson or the negative binomial model, pending tests of overdispersion.

I should point out, however, that the range of the response distribution for both Poisson and negative binomial models is (0, 1, 2, …∞), whereas the observed data range from 0 to 96. In theory, it is possible that the model could produce predicted values beyond the upper bound of the data (or the conceptual upper limit). This is analogous to the “out-of-range” problem in LMs. But, in practice, the predicted values (estimated means) rarely exceed the range of the observed data because the GLM (and the LM) models the mean of the response variable. Indeed, even if this situation occurs, it has no effect on parameter estimation and statistical inference. Moreover, with count data, one may also consider using the log-linear model, provided that the explanatory variables are all categorical variables. Because the substantive model includes both categorical and continuous explanatory variables, the log-linear model is not an option here.

As for the link function, although the identity link is sometimes adequate for the Poisson and the negative binomial models (e.g., a model with a single-factor predictor), the log link ensures that the mean number of close friends predicted from the fitted model is positive and is by far most common in practice (Agresti, 2002). The log link also makes interpretation of regression coefficients simple and meaningful. Thus, I choose the log link as the link function.

The method of ML estimation for GLMs

The second step in the application of GLMs involves estimating parameters in the linear predictor and performing significance tests of these parameters. From a practitioner's perspective, this step is simple and often unnoticed. This section briefly outlines the method of estimation in GLMs. The example is reintroduced in the following section.

Advances in statistical theory and computer software have made the method of ML the most popular estimation technique in applied statistics (Gill, 2001). It is therefore not surprising that this method is the theoretical basis for parameter estimation in GLMs. The basic idea of ML estimation is to obtain the most plausible values of the parameters, given the data (see Eliason, 1993, for a quick introduction to the theory and the method of ML estimation). To apply this method, we need to construct the log-likelihood function, defined as the logarithm of the product of the probability density function for each observation. This product is the probability of observing the actual data we collected. The parameter values, which maximize this probability, are known as the maximum likelihood estimates (MLEs).

Applying the ML method to GLMs, the log-likelihood function is given by

(4)

To obtain the MLEs, we take the first derivative of the log-likelihood function with respect to the parameters of interest and set it equal to 0, which results in the score equation we use to solve the parameters. The score equation is a system of p equations for p model parameters, which are nonlinear equations resulting from the link function. In general, the score equation does not have an explicit solution (Firth, 1991). Therefore, we usually use the iteration method assisted by computer to obtain the solution for the score equation, and this solution is necessarily approximate. Moreover, when responses do not obey any member of the exponential family or there is insufficient information to construct a likelihood function, we may resort to quasi-likelihood estimation to solve the score equation (see McCullagh & Nelder, 1989, chap. 9). The most common algorithms for GLM estimation are the iteratively weighed least squares and the Newton-Raphson method, which are used in most software packages such as SAS, S-Plus, and R.

For statistical inference, it is well known that ML estimators have asymptotic (large sample) properties. For example, they are asymptotically efficient and distributed asymptotically normally. The variance and covariance of the parameter vector β is given as a function of the information matrix. The information matrix, also known as Fisher information (matrix), is the second derivative of the log-likelihood function with respect to β. The asymptotic variance-covariance matrix of β is the inverse of the information matrix. With this variance-covariance matrix, we can perform significance tests for individual parameters. These tests make use of the Wald inference (Myers et al., 2002). Using the estimated asymptotic standard errors of the coefficients, the Wald test statistic, under H₀: β_j= 0, is given by

which has asymptotic inline image

distribution. Standard GLM (computer) printouts produce parameter estimate, inline image

and the associated inline image

with a p value for each coefficient in the postulated model.

Assessing the goodness of fit

The third step in the GLM application involves the evaluation of model fit. There are several measures of goodness of fit and other model fitting tools such as residual analysis. The main objective of these measures is to assess how well the model fits the observed data. In GLMs, a “good” model generally means that the distribution assumption is correctly specified and the link function appropriate, and the observed data are not overly dispersed. In this section, I discuss several common measures of goodness of fit. I return to the example at the end of the section.

Likelihood ratio test. The idea of fitting a model to data is to compare a set of observed data values y_i with a set of fitted values inline image

, computed from a model normally involving a relatively small number of parameters (McCullagh & Nelder, 1989). The smaller the discrepancy, the better the fit. In LMs, this strategy is translated into the principle of extra sum of squares in the assessment of model fitting. The most commonly used measure of fit (goodness of fit) is

where R² is the squared multiple correlation that indicates the proportion of the variance in the response variable that can be explained by the linear predictor, SS_t is the total sum of squares for y, and SS_e is the residual sum of squares. The difference between the two sums of squares is called the extra (regression) sum of squares. A similar strategy is employed to test the difference between a reduced model over the full model. The test makes the use of the difference in the sum of squares: SS_e (reduced) −SS_e (full).

ML inference uses the same principle. It substitutes the difference in log likelihood for the difference in the sum of squares. For tests of nested hypothesis, the generalized likelihood ratio test statistic is given by

where logl(full) and logl(reduced) are log likelihood under the full and reduced models, respectively, and LR has asymptotic inline image

distribution (pA and pB are the numbers of parameters in the full and reduced models, respectively). For example, suppose that the linear predictor is β₀+β₁x₁+β₂x₂+β₃x₃+β₄x₄+β₅x₅ and we want to test H₀: β₄=β₅= 0. Then, the generalized likelihood ratio test statistic is given by

The hypothesis is rejected if LR inline image is significant at the predetermined α level. By the same logic, if we are interested in testing H₀: β₁=β₂= , …, β_k= 0 (where k is the total number of explanatory variables in the linear predictor), this is equivalent to specifying a null model that includes only the intercept. That is, the model consigns a single μ to all y_i. Most statistical software packages produce log-likelihood values in their default output. Although the LR test is a widely used measure of model fit, it is important to note that log-likelihood values are not directly comparable between models under different distribution assumptions.

The LR test should not be confused with the Wald test noted earlier. LR makes use of the log likelihood and is computed by comparing the fit of two nested models. In contrast, the Wald test is calculated after a single model is estimated and makes use of the asymptotic normality of the MLEs of the parameters (Myers et al., 2002). LR is extremely useful for testing nested models, whereas the Wald test is typically used to test individual parameters and to construct confidence intervals. In GLMs, the Wald and LR are only asymptotically equivalent. The asymptotic properties for Wald inference do not hold well in small samples. Thus, LR is generally preferred in small-sample studies.

Deviance. The logic of constructing the log-likelihood ratio test also applies to the concept of deviance, another common measure of goodness of fit. Like LR, deviance is constructed from the logarithm of a ratio of likelihoods. Unlike LR, however, the full model now has n parameters (consigning one parameter to each observation), which is sometimes called the saturated model

In this saturated model, each observation y_i is consigned as the estimator inline image

. The reduced model has the same definition as in the LR test. Deviance is defined as twice the discrepancy of the maximum log likelihood achievable and the log likelihood achieved by the model specified:

where logl(P) is the log likelihood for the saturated model and logl(β) is the maximized log likelihood for the fitted GLM with p parameters. For a sample of n independent observations, D(β) also has asymptotic inline image

distribution. Dividing deviance by the dispersion parameter φ is the scaled deviance: D(β)/φ.

Interpretation of deviance is analogous to LR. In theory, deviance measures whether the fitted model is significantly worse than the saturated model. In other words, a nonsignificant finding indicates that the specified model fits the data no worse than (as well as) the saturated model. In practice, the quality of fit is deemed acceptable if D(β)/(n−p) is not appreciably larger than 1 (Myers et al., 2002). Deviance is by far the most common and perhaps the best measure of model fit, although the asymptotic result can be questionable for small samples (see McCullagh & Nelder, 1989). The values of deviance are also comparable across models under different distribution assumptions. A smaller value indicates a better fit. The value of deviance is always produced in standard GLM printouts. Appendix B presents the definition of deviance for selected probability distributions.

Another common measure of goodness of fit is the generalized Pearson's χ² statistic, which is given by

where

is the estimated variance function for the distribution in question. Pearson's χ² also has the asymptotic distribution as inline image

. Like deviance, the smaller the value of Pearson's χ², the better the fit. The scaled version of Pearson's χ² is χ²/φ.

Both deviance and generalized Pearson's χ² have the asymptotic χ² distributions, but neither is superior to the other when samples are small. Although both can be used to evaluate a sequence of nested models in the same manner as in the LR test, deviance has the advantage as a measure of model fit because it is additive for the nested models, whereas Pearson's χ² is not (McCullagh & Nelder, 1989). In practice, however, the two measures rarely contradict each other in qualitative terms (Myers et al., 2002).

Both measures, however, have a common problem. That is, when data contain a large number of observed zeros or there are very few (or no) observations for each pattern of covariates (known as sparse data), deviance and Pearson's χ² do not have approximate χ² distributions (Agresti, 2002). Family studies of rare events (e.g., family violence/abuses) often involve modeling sparse data. With extremely sparse data, p values computed for deviance and Pearson's χ² are incorrect (Hosmer & Lemeshow, 2000). Several modifications of these statistics have been proposed recently (e.g., Farrington, 1996; Paul & Deng, 2000), but none has been incorporated in standard software packages. Because the effect of sparse data is generally limited to the statistical properties of deviance and Pearson's χ² (McCullagh & Nelder, 1989), one should always examine other measures of goodness of fit when sparse data are modeled.

There are two “information” measures of model fit that are particularly useful for model comparison. One is the Akaike information criterion (AIC) (Akaike, 1974),

where logl(β) is the maximized model log likelihood, and p is the number of the parameters in the model. Akaike suggests using the criterion to choose between models. The AIC is useful to compare nested or nonnested models, and models estimated using different samples. The smaller the value of AIC, the better the fit.

The other measure is the Bayesian information criterion (BIC) (Schwarz, 1978),

where n is the sample size. The BIC is also used to choose between competing model specifications, particularly models estimated with different sample sizes, for BIC takes sample size into account. Again, a smaller value of BIC indicates a better fit. Although both the AIC and the BIC are widely used in the literature, simulation work by Amemiya (1980) shows that the BIC finds correct models more often than the AIC in small samples. The AIC penalizes models for having too many parameters. The value of the BIC gravitates more slowly than that of the AIC toward more complex models as sample size increases (Agresti, 2002). In practice, the two measures rarely give qualitatively different findings.

Residuals. In LMs, residuals are routinely analyzed for evaluating model assumptions such as the normality of error distributions and constant error variance. The residual in LMs is defined as

In GLMs, residuals are also useful to explore the fit of a model. The raw residuals inline image

have limited utility, however, because the variance of y_i is not constant in GLMs. Several other types of residuals have been proposed to adjust for the response variance. The most intuitive kind is Pearson's residual, given by

Clearly, r_p is the raw residual scaled by the estimated standard deviation of the response variable. The sum of squared r_p for all observations is the generalized Pearson's inline image

Using the same logic, deviance residual is defined as

where d_i is the individual component of deviance such that inline image

. Again, the sum of the squared r_D over all the observations is inline image

. Along this line of reasoning, other types of residuals, such as the Anscombe residual, can also be formulated in GLMs (see Agresti, 2002).

Like LMs, residuals are often plotted to examine the adequacy of the model fit with respect to the choice of variance function, link function, and model specification. Also consistent with their use in LMs, McCullagh and Nelder (1989, p. 396) recommend standardized (or studentized) residuals (by dividing the raw residual by a factor that makes its variance constant). They suggest several residual plots for exploring the adequacy of the model. For example, plotting standardized residuals against the fitted values can be used to examine the assumed variance function, and plotting the adjusted response variable or the (standardized) residuals against the linear predictor or regressors can be used to examine the assumed link function. They also recommend a normal probability plot of the deviance residuals to examine asymptotic normality of the residuals. Interpretations of these plots are similar to those in the analysis of residuals in LMs.

Overdispersion. I noted that overdispersion is an important issue in assessing the overall goodness of fit. Overdispersion typically occurs when the variance of the response variable exceeds the nominal variance in the binomial, Poisson, or exponential distributions, because these distributions all have a known and fixed dispersion parameter: φ= 1. Observed data are overdispersed when inline image in their variance functions. In a similar vein, underdispersion occurs when , which is considerably less common in practice.

One source of overdispersion is nonhomogeneity of data. In family research, the most common mechanism is clustering (effects) in (survey) data, particularly when cluster sizes are small (e.g., families, households, and neighborhoods). In this case, clustering effects arise from positive correlations between the members within the cluster. Another source of overdispersion comes from an incorrect distribution assumption and/or an incorrect link function assumed for the data. Overdispersion can also arise when the linear predictor is incorrectly specified. For example, relevant explanatory variables or interaction terms are missing in models (i.e., underspecified models). Finally, overdispersion can occur in the presence of outliers in data.

The consequences of overdispersion are similar to those for inflated error variance in LMs. In the case of LMs, the variance and covariance matrix for regression coefficients is given by (X′X)⁻¹σ². An increase in inline image inflates the variances of regression coefficients V(b), and ignoring this elevated error variance must underestimate standard errors. Similarly, in the cases of the binomial, Poisson, and exponential distributions, overdispersion also inflates V(b) and underestimates standard errors in these models. This is because by constraining inline image (but in fact ), standard errors can be grossly underestimated. With overdispersed data, however, so long as the link function is correctly specified, the ML estimators of the regression coefficients remain asymptotically unbiased (Myers et al., 2002).

Although formal diagnostic tools have been developed for overdispersion (e.g., Cameron & Trivedi, 1986), a simple measure of overdispersion is the mean deviance: D(β)/df, where df=n−p. If the mean deviance is appreciably larger than 1 or, equivalently, the value of deviance is substantially larger than n−p, it indicates an overdispersion problem. Similarly, if the mean deviance is considerably less than 1, it suggests an underdispersion problem. This measure is particularly effective for large samples (Dobson, 2002).

There are remedial measures for overdispersion. A common approach to correct for clustering effects makes use of random-effect, mixed, hierarchical linear, or other models for correlated or multilevel data (e.g., Bryk & Raudenbush, 2002; McCulloch & Searle, 2001). A GLM approach to correct for overdispersion invokes the use of the quasi-likelihood estimation (Wedderburn, 1974), which adds an extra parameter φ in the model. The variance functions for the binomial, Poisson, and exponential distributions are adjusted by a factor of φ. Because the estimating equations do not depend on φ, the estimated regression coefficients remain asymptotically unbiased (Firth, 1991). In practice, the mean deviance is typically used as an estimate of φ. With this adjustment, V(b) is proportional to φ, and the variance-covariance matrix is multiplied by a factor of φ. Log likelihoods are also divided by φ. A similar strategy can be used for the problem of underdispersion. Finally, as noted, with overdispersed count data, one may consider the negative binomial model, which has an additional parameter that is used to model the variance function and to account for overdispersion.

Example. So far, I have outlined the method of model selection and estimation, and measures of goodness of fit. I now return to the example. The results in the application are reported by SAS GENMOD (Generalized Linear Models) procedure, which uses a stabilized Newton-Raphson algorithm for solving the score equation (SAS Institute, 1999).

I have noted that with count data, the Poisson model is a natural choice, provided that the observed data are not overly dispersed. To confirm this and to compare models with competing distribution assumptions, I estimated three models on the basis of three distribution assumptions: the normal, the Poisson, and the negative binomial distribution. The LM uses the identity link, whereas the Poisson and the negative binomial models employ the log link function. The linear predictor involves the explanatory variables shown in Table 2. Table 3 presents selected measures of goodness of fit for the three models.

Table 3. Selected Goodness-of-Fit Statistics for Models Based on Normal, Poisson, and Negative Binomial Distributions, the 1990 General Social Survey, Statistics Canada

Model	df	Value	Value/df
Normal (linear)
Deviance	1,894	386,165	204.0
Pearson's χ²	1,894	386,165	203.9
Log likelihood	—	−7,770	—
Poisson
Deviance	1,984	22,968	12.13
Pearson's χ²	1,984	38,915	20.55
Log likelihood	—	21,478	—
Negative binomial
Deviance	1,984	2,185	1.15
Pearson's χ²	1,984	3,237	1.71
Log likelihood	—	29,918	—

Note: N= 1,907. Models include the explanatory variable shown in Table 2.

Recall that models with a smaller value of deviance indicate a larger model log likelihood and thus a better fit. Table 3 shows that the value of deviance varies substantially between the three models. Indeed, deviance for the LM is over 16 times larger than that of the Poisson model, which in turn is 10 times larger than that of the negative binomial model. The values of Pearson's χ² statistic give similar findings. With large samples, both deviance and Pearson's χ² have approximately χ² distribution with (n−p) degrees of freedom (shown in the df column of the table). Because the same link function is used for the Poisson and the negative binomial models, the discrepancy in the value of deviance between them is entirely attributable to the assumption made on the response distribution. Given these goodness-of-fit statistics, there is little doubt that the negative binomial model fits the data better than the LM or the Poisson model.

As noted, the mean deviance, D(β)/(n−p), is an indicator for the overall model fit and for overdispersion. The mean deviance in Table 3 confirms that the negative binomial model fits the data better compared to the two alternative models. The mean deviance for the Poisson model is 12.13, which exceeds 1 by a large margin, providing strong evidence for overdispersion. With an additional parameter added to account for overdispersion, the negative binomial model again is a clear winner with the mean deviance slightly over 1 (1.15).

To illustrate the substantive effects of overdispersion, Table 4 shows the parameter estimates and their standard errors for the three models. We see that standard errors for the estimates in the Poisson model are small compared to either the LM or the negative binomial model. Indeed, standard error in the Poisson model is smaller than its counterpart in the negative binomial model (or LM) in each and every instance, and every estimate in the Poisson model is significant at p < .01. This example demonstrates that failing to account for overdispersion can produce enormously misleading results.

Table 4. Regression Coefficients From Linear, Poisson, and Negative Binomial Models of Close Friends, the 1990 General Social Survey, Statistics Canada

Independent Variable	Linear	Poisson	Negative Binomial
Marital status
Separated/divorced	−3.065 (1.611)	−0.377 (0.041)	−0.383 (0.138)
Widowed	−1.449 (1.200)	−0.174 (0.030)	−0.205 (0.103)
Never married	−1.055 (1.665)	−0.135 (0.039)	−0.139 (0.138)
Married/cohabiting^a
Female (yes= 1)	−4.728 (0.777)	−0.513 (0.018)	−0.501 (0.062)
Age	−0.042 (0.069)	−0.005 (0.002)	−0.005 (0.006)
Children	−0.155 (0.142)	−0.018 (0.003)	−0.008 (0.011)
Siblings	0.089 (0.113)	0.010 (0.003)	0.010 (0.009)
Living alone (yes= 1)	1.074 (1.141)	0.133 (0.029)	0.151 (0.100)
Income	−0.285 (0.131)	−0.032 (0.003)	−0.027 (0.010)
Education	0.097 (0.139)	0.011 (0.003)	0.018 (0.011)
Health	0.927 (0.404)	0.104 (0.010)	0.106 (0.033)
Church attendance	0.372 (0.206)	0.042 (0.005)	0.042 (0.017)
Intercept	12.258 (5.282)	2.548 (0.124)	2.389 (0.425)

Note: Standard errors are in parentheses.
^aReference category.

I am now more convinced that the negative binomial model is the best choice. An acceptable value of deviance, however, does not necessarily mean that the model (linear predictor) is correctly specified. For example, in this study, the effects of living alone on close friends could vary by marital status. Previously married elders who live alone may have a particularly small friendship network (Pinquart, 2003). It could also be that the effects of marital status differ between women and men. Widowers may have fewer social opportunities than widows (Lamme, Dykstra, & Broese Van Groenou, 1996). To test these ideas, I construct three alternative negative binomial models and present selected goodness-of-fit measures in Table 5.

Table 5. Goodness-of-Fit Statistics for Selected Negative Binomial Models of Close Friends, the 1990 General Social Survey, Statistics Canada

Model	Log L	Deviance	Akaike Information Criterion	Bayesian Information Criterion	df
1. Null model (intercept + dispersion parameter)	29,860	2,190.4	−59,717	−59,706	1,906
2. Covariates in Table 4	29,918	2,185.0	−59,807	−59,730	1,894
3. Covariates + Marital Status × Living Alone	29,922	2,185.0	−59,810	−59,716	1,891
4. Covariates + Marital Status × Female	29,920	2,185.2	−59,805	−59,711	1,891
5. Covariates + Marital Status × Living Alone + Marital Status × Female	29,924	2,184.8	−59,807	−59,696	1,888

As in LMs, one may begin by testing the overall model: H₀: β₁=β₂= , …, β_k= 0 (where k =12, the number of covariates in Table 4). This is an equivalent to the F test in LMs. Using the log-likelihood values from Models 1 and 2, we construct the LR test,

with 12 degrees of freedom, which is a highly significant χ² value at p < .001. We reject the null hypothesis and conclude that at least one term in Model 2 is nonzero and has an effect on close friends. Using the same test, we evaluate the three alternative model specifications with Model 2 as the baseline model. LR tests show that Model 3 improves Model 2 significantly (χ²= 8, df= 3, p= .02). But Model 4 fails to improve Model 2, and Model 5 does not improve Model 3 (p > .05). In short, these results appear to support Model 3, which suggests that the effects of living alone change depending on marital status.

Do the other measures of goodness of fit support this conclusion? In Table 5, we see that the value of deviance declines between Models 1 and 2, consistent with the finding of the LR test. There is virtually no change in deviance, however, between Models 2 and 3, 2 and 4, or 3 and 5. The two information measures of goodness of fit, AIC and BIC, also give the impression that Model 2 (the baseline model) is perhaps the best model. There is no clear justification for any other alternative model specification, including Model 3. In light of these findings, I would retain Model 2 as the final model because these goodness-of-fit measures rarely give qualitatively different findings and they are consistent with Model 2, and Model 2 only.

This process of substantive model selection is empirically based. For example, as noted, there are theoretical reasons to retain the interaction term of marital status and living alone in the substantive model even though it does not contribute significantly to the fit of the model. In other words, if we set out to test the idea that the effects of living alone change with marital status as an a priori hypothesis, we may wish to keep the interaction term regardless of whether it improves the fit of the model.

Considering Model 2 as the final model, three diagnostic plots of residuals are shown in 1 2 3 1-3. Figure 1 plots the standardized deviance residuals against the fitted values inline image to examine the variance function. The null pattern for this plot is a distribution of residuals for varying fitted values with mean zero and constant variance. To check the link function, Figure 2 plots the standardized deviance residuals against age, an explanatory variable in the linear predictor. This plot has the same null pattern. For both plots, any appearance of significant curvature or a trend may suggest the wrong choice of the variance function or the link function. Clearly, no such pattern is evident in these plots. Figure 3 shows the normal probability plot of the deviance residuals. The null pattern is the straight line in the graph. Analogous to the same plot in LMs, we focus on the interquartile range of the normal quantiles. Although there are outliers (large residuals) present, the departure from asymptotic normality is not unacceptable when we focus on the middle 50% of the null pattern. Overall, these plots suggest that Model 2 in Table 5 is adequate.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Plot of the Standardized Deviance Residuals Versus the Fitted Values, the 1990 General Social Survey, Statistics Canada

Having selected the preferred model, I now examine the effects of the explanatory variables on close friends shown in Table 4 under the negative binomial model. The fitted Model 4 is given by inline image where a= 2.389 + (−.383 × separated/divorced) +, ⋯, + (.042 × church attendance). In GLMs, the interpretation of coefficients is determined by the link function. For example, when the logit link (see Table 1) is used for fitting the binomial or Bernoulli distribution, the effects of explanatory variables are commonly interpreted as odds ratios. For the log link for the Poisson and negative binomial models, the effects are interpreted as the estimated mean (counts) of the response variable.

One approach to interpreting the coefficients in GLMs is by using first differences. The basic idea of first differences is to choose two levels of some interest of an explanatory variable and compute the difference in effect on the response variable, setting all other coefficients (covariates) at zero or holding the covariates at a certain level (e.g., the mean) (Gill, 2001). This strategy is particularly effective for interpreting the effects of categorical explanatory variables. Further, a simple transformation, 100(exp(b_j) − 1), gives the percentage change in the response variable for a one-unit change in the explanatory variable x_j.

Table 4 shows that marital status has a significant effect on close friends. All things being equal (setting all other coefficients at zero), the estimated mean counts of close friends for separated or divorced older adults is 32% (100(exp(−.383) − 1)) lower than that for married older adults. The comparable figure for widowed older adults is 19%. The never married are not significantly different from the married. Older women have 39% fewer (mean number of) close friends than older men. Personal earnings have a negative effect. The mean number of close friends is 3% fewer for each additional (approximately) $5,000 earnings one has. Both self-reported health status and church attendance are positively linked to the mean number of close friends. These findings are generally consistent with the literature. The number of children, siblings, living alone, and education, however, are not significantly related to close friends. Close friendships are enduring relationships. For older persons, many of these relationships were built in the early years of their life. Some of the explanatory variables (e.g., living alone and marital status), however, only reflected life circumstances at the time of the survey and may have little bearing on the size of friendship network for this older adult population. The analysis is intended only as an example of GLMs, not as a substantive contribution to the literature on social networks.

Statistical software for GLMs

The growing interest in GLMs in industry and in the research community has spurred the development of computer packages that offer a GLM routine, and many of these packages have been reviewed in the literature (e.g., Hilbe, 1994; Oster, 2002, 2003; Zhou, Perkins, & Hui, 1999). GLIM (Generalized Linear Interactive Modeling), developed by the GLIM Working Group of the Royal Statistical Society and marketed by Numerical Algorithms Group, was the first commercial package to provide GLM capability (Aitkin, Anderson, Francis, & Hinde, 1989). The latest version, GLIM 4, offers all standard GLMs, survival models, and modeling options with quasi-likelihood and extended quasi-likelihood. GLIM 4 is perhaps the most comprehensive package developed specifically for fitting GLMs.

In addition, many general-purpose packages also provide a routine for standard GLMs, including, for example, SAS, S-Plus, R, and Stata. Several social science packages, such as SPSS, SYSTAT, and LIMDEP, support a wide range of procedures that fit individual GLMs such as Poisson, logistic, probit, log-linear, negative binomial, and multinomial logit models. Although SPSS is a popular package among social science researchers, at this writing, it does not include a GLM procedure.

In this article, SAS GENMOD procedure was employed for the analysis. The GENMOD procedure offers an extensive GLM modeling and diagnostic capacity. Starting from Release 6.12, the GENMOD procedure offers options for fitting models that handle correlated responses using the method of the generalized estimation equations (GEE) (see Diggle, Liang, & Zeger, 1994). In my view, the GENMOD procedure is one of the most accessible GLM routines for family researchers who have some basic programming knowledge of SAS. The GENMOD procedure should not be confused with the GLM procedure in SAS. The latter is an acronym for general linear model, a term with the same definition that is also used by SPSS and SYSTAT.

Discussion and conclusion

GLMs, as defined by Nelder and Wedderburn (1972), unify a class of regression models for discrete and continuous response variables. As an extension of classical LMs, GLMs provide a common body of theory and methodology for various types of regression models, such as the logistic, Poisson, probit, multinomial, and survival models increasingly used in family studies. The principle of GLMs is to employ some nonlinear link function that connects the mean of a population and a linear predictor and to allow the response distribution to be any member of the exponential family. Parameters are estimated on the basis of likelihood theory, using an iteration algorithm.

On the one hand, some individual cases of generalized linear modeling techniques (e.g., logistic and Poisson models) are familiar to many family scholars, but the overall concept of these techniques and the associated theory and application remain essentially elusive among these researchers. There is also substantial confusion because of the abbreviation GLM, a term used by some practitioners to refer to general linear model, which only comprises LMs, analysis of variance, and analysis of covariance (Lane, 2002). On the other hand, GLMs provide a unified framework, incorporating conventional “general linear models” and many commonly used nonlinear models.

This article illustrates how familiar models, such as linear, logistic, log-linear, and Poisson models, are examples of a conceptually simple structure that is based on the exponential family of distributions and maximum likelihood estimation. As noted, the article aims at two types of JMF readers: (a) those who have substantial knowledge of LMs but limited exposure to nonlinear models and (b) those who are comfortable users of common nonlinear models such as logistic and Poisson models but who are unaware of the theory and application of GLMs. The former audience may benefit by learning a statistical method that accommodates data from a wide range of nonnormal distributions. The latter audience may appreciate the simplicity of the conceptual structure of GLMs and may extend their understanding of nonlinear models in theory and applications.

GLMs have gained popularity in industry, science, and engineering over the past decade. The class of GLMs provides flexibility, fitting a wide range of response distributions in the exponential family and beyond. One key advantage of LMs is that the effects of explanatory variables (systematic effects) are additive. In GLMs, with the introduction of the link function, the virtue of additivity of systematic effects remains (McCullagh & Nelder, 1989). Another important advantage of the GLM approach is that it can be applied even when the underlying distribution of data does not belong to the exponential family or when the responses are correlated (Liang & Zeger, 1986). In those cases, GLMs invoke a new likelihood function, known as quasi-likelihood (Wedderburn, 1974). This method of estimation requires only the information on the first two moments of the data (the mean and the variance) and their relationship, and in most cases, it can achieve the same level of efficiency as the ML method (McCulloch & Searle, 2001). Finally, the concept of GLMs provides a useful framework for teaching and learning advanced social statistics (e.g., categorical data analysis and survival analysis) in family studies and in other social science disciplines. By using a common theory and a common (iteration) method, GLMs unify seemingly unrelated statistical models and procedures and simplify the teaching and learning of the subject (Nelder & Wedderburn, 1972).

Compared to LMs, the development of GLMs is still in its infancy. There remain several unsolved problems. One of these problems is the nonorthogonality of the terms in the LR-based tests including the analysis of deviance (McCullagh & Nelder, 1989). This means that there is generally no unique deviance attributable to each set of covariates in evaluating a sequence of models. With complex data, there can be two or more models that fit the data equally well. Indeed, it is misleading to conclude that a given model is the best model because there may be other models that attain the same or nearly the same level of goodness of fit. McCullagh and Nelder (1989) recommend against using deviance as the sole criterion for assessing model fit. Other measures, such as information measures (e.g., the AIC and the BIC) and residual plots, should supplement the use of deviance.

Another problem is inherent to all nonlinear models that use the ML estimation. Because ML estimators are based on asymptotic distributions, these estimators are biased. In large samples, the bias is negligible when compared with standard errors (McCullagh & Nelder, 1989, p. 455). When sample sizes are modest, however, or when the ratio of the number of parameters in the model to the sample is appreciably large, the bias can be substantial. In those cases, one may consider making a bias adjustment to improve the ML approximations (see McCullagh & Nelder, 1989, chap. 15). In practice, however, the bias may be negligible (less than 5% of a standard error) if there are at least 20 observations for each parameter in the model (Lane, 2002, p. 250).

Several topics in GLMs were not discussed here. For example, the method of quasi-likelihood estimation is only mentioned. Topics on longitudinal data analysis using GLMs including GEE (Liang & Zeger, 1986) and on multilevel data using generalized mixed models (McCulloch & Searle, 2001) were not covered. I elaborated, however, on the key components of GLMs and discussed their strengths and limitations. It is my hope that family scholars will find this introduction useful in their teaching and research.

Note

I gratefully acknowledge financial support from the Social Sciences and Humanities Research Council of Canada. Additional research support was provided by the Department of Sociology, the University of Victoria. I thank Jenny Jiang, Chris Schimmele, and Yanyi Wang for research assistance, and three anonymous reviewers for helpful comments and suggestions.

Appendix

Appendix A
The exponential family of distributions

This exponential function takes the form

((A1))

where a(·), b(·), and c(·) are specific functions. The parameter θ is sometimes called the natural parameter of the distribution, and φ is the dispersion parameter. The function a(φ) generally takes the form a(φ) =φ/ϖ, where ϖ is a known constant. These subfunctions produce sufficient statistics, such as the mean and the variance, that characterize the functions that belong to the exponential family.

Consider the case of the binomial distribution, a distribution frequently used in family research (e.g., logistic regression). If we assume y (the outcome of Bernoulli trials) is binomial with parameter n (number of trials) and p (the probability of success), the probability distribution (density) function can be rearranged as

A close look at this reparameterization shows that it confirms to Equation A1 with θ= log(p/1 −p), b(θ) =−nlog(1 −p), a(φ) =φ= 1, and c(y,φ) = inline image . The probability distribution functions for other members of the exponential family, such as the normal, Poisson, and negative binomial distributions, can be reparameterized in a similar fashion.

Table Appendix B. . Deviance Functions for Selected Generalized Linear Models

Distribution	Deviance Function
Normal
Binomial
Poisson
Negative binomial
Gamma
Inverse Gaussian

Note: For binomial distribution, y_i= r_i/n_i, where r_i is a binomial count and n_i is the number of binomial trials.

References

Agresti, A. (2002). Categorical data analysis ( 2nd ed.). Hoboken, NJ: Wiley.
10.1002/0471249688
Google Scholar
Aitkin, M., Anderson, D., Francis, B., & Hinde, J. (1989). Statistical modeling in GLIM. Oxford, U.K.: Clarendon Press.
Google Scholar
Akaike, H. (1974). A new look at statistical identification. IEEE Transactions on Automatic Control AU-19, 716–722.
Google Scholar
Amemiya, T. (1980). Selection of regressors. International Economic Review, 21, 331–354.
10.2307/2526185
Web of Science®Google Scholar
Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39, 357–365.
10.2307/2280041
CASWeb of Science®Google Scholar
Bliss, C. I. (1935). The calculation of the dosage-mortality curve. Annals of Applied Biology, 22, 134–167.
10.1111/j.1744-7348.1935.tb07713.x
CASWeb of Science®Google Scholar
Bryk, A. S., & Raudenbush, S. W. (2002). Hierarchical linear models: Applications and data analysis methods ( 2nd ed.). Newbury Park, CA: Sage.
Google Scholar
Cameron, A. C., & Trivedi, P. K. (1986). Econometric models based on count data: Comparisons and applications of some estimators and tests. Journal of Applied Econometrics, 1, 29–53.
10.1002/jae.3950010104
Web of Science®Google Scholar
Campbell, K. E., & Lee, B. A. (1990). Gender differences in urban neighbouring. Sociological Quarterly, 31, 495–512.
10.1111/j.1533-8525.1990.tb00725.x
Web of Science®Google Scholar
Campbell, K. E., & Lee, B. A. (1992). Sources of personal neighbour networks: Social integration, need, or time? Social Forces, 70, 1077–1100.
Web of Science®Google Scholar
Carroll, R. J., & Ruppert, D. (1988). Transformation and weighting in regression. London: Chapman & Hall.
10.1007/978-1-4899-2873-3
CASGoogle Scholar
Diggle, P. J., Liang, K.-Y., & Zeger, S. L. (1994). Analysis of longitudinal data. Oxford, U.K.: Clarendon Press.
10.2307/2986113
CASWeb of Science®Google Scholar
Dobson, A. J. (2002). An introduction to generalized linear models ( 2nd ed.). Boca Raton, FL: Chapman & Hall.
Web of Science®Google Scholar
Draper, N., & Smith, H. (1981). Applied regression analysis ( 2nd ed.). New York: Wiley.
Google Scholar
Dyke, G. V., & Patterson, H. D. (1952). Analysis of factorial arrangements when the data are proportions. Biometrics, 8, 1–12.
10.2307/3001521
Web of Science®Google Scholar
Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice. Newbury Park, CA: Sage.
10.4135/9781412984928
Google Scholar
Farrington, C. P. (1996). On assessing goodness of fit of generalized linear models to sparse data. Journal of the Royal Statistical Society B, 58, 349–360.
10.1111/j.2517-6161.1996.tb02086.x
Web of Science®Google Scholar
Felton, B. J., & Berry, C. (1992). Groups as social network members: Overlooked sources of social support. American Journal of Community Psychology, 20, 253–261.
10.1007/BF00940839
CASPubMedWeb of Science®Google Scholar
Firth, D. (1991). Generalized linear models. In D. V. Hinkley, N. Reid, & E. J. Snell (Eds.), Statistical theory and modelling (pp. 55–86). London: Chapman & Hall.
Google Scholar
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Proceedings of the Royal Society, 222, 309–368.
Google Scholar
Gill, J. (2001). Generalized linear models: A unified approach. Thousand Oaks, CA: Sage.
10.4135/9781412984348
Google Scholar
Greene, W. H. (2003). Econometric analysis ( 5th ed.). Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Hilbe, J. M. (1994). Generalized linear models. American Statistician, 48, 255–265.
10.2307/2684732
Web of Science®Google Scholar
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression ( 2nd ed.). New York: Wiley.
10.1002/0471722146
Google Scholar
Lamme, S., Dykstra, P. A., & Broese Van Groenou, M. I. (1996). Rebuilding the network: New relationships in widowhood. Personal Relationships, 3, 337–349.
10.1111/j.1475-6811.1996.tb00120.x
Web of Science®Google Scholar
Lane, P. W. (2002). Generalized linear models in soil science. European Journal of Soil Science, 53, 241–251.
10.1046/j.1365-2389.2002.00440.x
Web of Science®Google Scholar
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal analysis using generalized linear models. Biometrika, 73, 13–22.
10.1093/biomet/73.1.13
Web of Science®Google Scholar
Logan, J. R., & Spitze, G. D. (1994). Family neighbours. American Journal of Sociology, 100, 453–476.
10.1086/230543
Web of Science®Google Scholar
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models ( 2nd ed.). London: Chapman & Hall.
10.1007/978-1-4899-3242-6
CASGoogle Scholar
McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: Wiley.
Google Scholar
Myers, R. H., Montgomery, D. C., & Vining, G. G. (2002). Generalized linear models: With applications in engineering and the sciences. New York: Wiley.
Google Scholar
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, 370–384.
10.2307/2344614
Web of Science®Google Scholar
Paul, S. R., & Deng, D. (2000). Goodness of fit of generalized linear models to sparse data. Journal of the Royal Statistical Society B, 62, 323–333.
10.1111/1467-9868.00234
Web of Science®Google Scholar
Oster, R. A. (2002). An examination of statistical software packages for categorical data analysis using exact methods. American Statistician, 56, 235–246.
10.1198/00031300274
Google Scholar
Oster, R. A. (2003). An examination of statistical software packages for categorical data analysis using exact methods—Part II. American Statistician, 57, 201–213.
10.1198/0003130031928
Web of Science®Google Scholar
Pinquart, M. (2003). Loneliness in married, widowed, divorced, and never-married older adults. Journal of Social and Personal Relationships, 20, 31–53.
10.1177/02654075030201002
Web of Science®Google Scholar
SAS Institute. (1999). SAS onlinedoc (Version 8). Cary, NC: SAS Institute.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
10.1214/aos/1176344136
PubMedWeb of Science®Google Scholar
Statistics Canada. (1991). General social survey cycle 5: Family and friends—Public use microdata file documentation and user guide. Ottawa, Canada: Statistics Canada.
Google Scholar
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-Newton method. Biometrika, 61, 439–447.
Web of Science®Google Scholar
Wu, Z., & Pollard, M. S. (1998). Social support among unmarried and childless elderly persons. Journal of Gerontology: Social Sciences, 53B, S324–S335.
10.1093/geronb/53B.6.S324
Google Scholar
Zhou, X., Perkins, A. J., & Hui, S. L. (1999). Comparisons of software packages for generalized multilevel models. American Statistician, 53, 282–290.
10.2307/2686112
Web of Science®Google Scholar

Citing Literature

Wiley Online Library requires cookies for authentication and use of other site features; therefore, cookies must be enabled to browse the site. Detailed information on how Wiley uses cookies can be found in our Privacy Policy [https://www.wiley.com/en-us/privacy].

Volume67, Issue4

November 2005

Pages 1029-1047

Generalized Linear Models in Family Studies

Figures

References

Information

Abstract

Generalized linear models

Linear models

The structure of GLMs

The exponential family of distributions

Link functions

An application of GLMs

Model selection

The method of ML estimation for GLMs

Assessing the goodness of fit

Statistical software for GLMs

Discussion and conclusion

Note

Appendix

Appendix A
The exponential family of distributions

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Generalized Linear Models in Family Studies

Figures

References

Related

Information

Abstract

Generalized linear models

Linear models

The structure of GLMs

The exponential family of distributions

Link functions

An application of GLMs

Model selection

The method of ML estimation for GLMs

Assessing the goodness of fit

Statistical software for GLMs

Discussion and conclusion

Note

Appendix

Appendix A The exponential family of distributions

References

Citing Literature

Figures

References

Related

Information

Appendix A
The exponential family of distributions