ISAAC Story - Statistical Analyses

The main variables reported are defined as:

Wheeze: “Have you/your child had wheezing or whistling in the chest in the last 12 months?”
Severe wheeze: “Have you/your child had wheezing or whistling in the chest in the last 12 months?” and one of “4 or more attacks of wheeze” or “sleep been disturbed due to wheezing on average once or more per week” or “had wheezing severe enough to limit speech to only one or two words at a time between breaths”.
Reported asthma: “Have you/your child ever had asthma?”
Rhinoconjunctivitis: “In the past 12 months, have you had a problem with sneezing, or a runny, or a blocked nose when you DID NOT have a cold or the flu? If yes: in the past 12 months, has this nose problem been accompanied by itchy-watery eyes?”
Hay Fever ever: “Have you/your child ever had hayfever?”
Eczema: “Have you ever had an itchy rash which was coming and going for at least 6 months? If yes: Have you had this itchy rash at any time in the last 12 months? If yes: Has this itchy rash at any time affected any of the following places: the folds of the elbows, behind the knees, in front of the ankles, under the buttocks, or around the neck, ears, or eyes?”
Reported eczema: “Have you/your child ever had eczema?”

In centres where a random sample of schools was taken, the effect of cluster sampling by schools was examined calculating the design effects [Rao 1992]. The effects of cluster sampling were generally small but have been incorporated in analyses involving tests of significance.

Basic descriptive summaries of the data were compiled by centre and country, in both age groups, along with Spearman correlations between variables. These summaries have often been displayed as ranked plots (see example right). A variety of analytic methods have been used in papers, some are described below.

The within-country and between-country variances were estimated using a generalised linear mixed model in which country, and centre within country, are random effects [Wolfinger 1993]. With this model, the ratio of the 95% CI of prevalences (between country to within country) were calculated.

Collapse

An important feature of the Phase Two design was the restriction of more expensive or invasive measurements to a subsample of children within each centre, selected according to history of wheezing in the last year. This stratified sampling design required statistical analyses for many of the variables to be weighted (using “survey weights” inversely proportional to the sampling fractions for wheezers and non-wheezers). The SAS procedures SURVEYREG and SURVEYLOGISTIC were used for this purpose (in Stata, svy: commands perform the same survey-weighted analysis).

The general approach adopted for Phase Two data analysis was to fit separate models for each centre and then pool the resulting regression coefficients in a random-effects meta-analysis. The random-effects pooling allowed for possible heterogeneity of risk factor associations between centres. In many analyses, a separate pooling within two groups of centres (more affluent, and less affluent, defined by national GNI per capita) proved to be informative.

This two-step approach to analysis of risk factor associations in Phase Two contrasts with the single-step approach adopted in Phase Three, where a fixed-effect pooling of regression coefficients was implemented along with random centre-level intercepts, using PROC GLIMMIX in SAS. Such a single-step approach could not be implemented for many of the outcomes in Phase Two, since the necessary survey-weighted regression cannot be combined with the multi-level model structure within PROC GLIMMIX.

However, for Phase Two outcomes which were ascertained on all subjects, multi-level models were developed in SAS (PROC GLIMMIX) and Stata (xtmelogit) to explore random effects both for intercepts (ie. centre-level prevalences) and slopes (ie. risk factor associations).

Collapse

Whether to use absolute or relative change in prevalence: the former was chosen.
Calculation of change per year to address the variable time period between studies.
Use of mean prevalence (average of Phase One and Phase Three), rather than Phase One prevalence, to assess change in relation to prevalence. This followed the approach of Bland and Altman which avoids the problem of “regression to the mean” leading to a spurious correlation between initial level of a measurement and change over time.
Adjustment for the cluster sample design by adjustment to the effective sample size of the prevalence estimates. Since most centres selected a sample of schools and then studied all children of the eligible age within those schools, there is a theoretical “design effect” due to the greater correlation of asthma and allergy prevalence within schools than between schools. This “design effect” was accounted for in analyses which involved significance tests by decreasing the sample size of each prevalence estimate by a factor derived for each outcome, centre, age-group and ISAAC phase, representing the effective sample size, relative to the actual sample size, adjusting for clustering at the school level. In most centres, the effect of this adjustment was small.
Tolerance of minor differences in fieldwork procedures between Phase One and Phase Three. This is discussed in greater detail under “Quality Assurance”

Collapse

Centres with fewer than 500 children (except for centres representing a complete census of the population), and centres with more than 30% missing data for the risk factor and covariates of interest, were excluded from the analysis. Frequency tabulations of the outcome, risk factor of interest, and specified individual-level covariates were prepared for each centre and combined into a single dataset for each outcome and age group. The frequency counts were then adjusted downwards in proportion to the design-effect adjustment factors for the outcome in question, for each centre and age group.

These design-effect-adjusted frequency tabulations provided the input for SAS DATA/PROC... (conversion procedure to individual-level data? – equivalent procedure in Stata is “expand”) and were analysed in PROC GLIMMIX specifying random intercepts at the centre level, but common slopes for the individual-level risk factors and covariates. Region, language and GNI per capita were included as standard centre-level covariates. Sex was always included as an individual-level covariate. Analyses were performed for all centres combined, for subgroups of centres defined by region, language and GNI, and for boys and girls separately. Additional individual-level covariates and interactions were included in the models, as appropriate for specific risk factor analyses.

Collapse

Direct standardisation:

Separate regression models are fitted for each study centre, to obtain centre-specific slopes for each explanatory (x-)variable. Since the main outcomes of interest are dichotomous, our outcome (y-)variable is logit(p) where p is the proportion of “cases” (affected individuals). Thus, the parameter estimates from these centre-specific models are in the form of log-odds-ratios and the linear predictions derived from them (“xb” in SAS/Stata terminology) are in the form of log-prevalence-odds: ln[p/(1-p)].
For each centre, a prediction (xb) and its standard error (stdp) is derived at the level of each explanatory variable which correponds to its mean in the global (all-centres) dataset. (This is analogous to directly standardising centre-specific death rates for each age-sex group by applying them to a global distribution of age and sex).
The standardised (risk-factor-adjusted) prevalence logodds for each centre, and their corresponding variances, can then be considered as units in a conventional meta-analysis, deriving measures of heterogeneity including Cochran’s Q and Higgins I². They can also be used as the outcome variable in ecological analyses of disease prevalence at the centre level.

Multi-level modelling:

All centres are modelled in a single dataset with an categorical indicator variable for each centre and centre-level covariates (such as language, or GNI per capita) match-merged by centre.
Multi-level modelling procedures such as PROC GLIMMIX in SAS, and xtmelogit in Stata, offer options for analysing either the centre-level intercepts, or the centre-specific risk factor associations (regression slopes), or both, as “random effects” (ie. drawn from a hypothetical distribution of intercepts or slopes, with the usual assumption being that this distribution is Gaussian).
The approach used in Phase Three risk factor analyses specified random intercepts and common slopes. This is equivalent to a fixed-effect (inverse-variance-weighted) pooling of the risk factor associations across study centres.
The approach used in exploratory Phase Two analyses specifies random intercepts and random slopes.
The two-step meta-analytical approach used in standard Phase Two publications is broadly equivalent to fixed centre-level intercepts and random slopes.

Collapse

Statistical Analyses

Statistical methods used in ISAAC: Phase One

Statistical methods used in ISAAC: Phase Two

Statistical methods used in ISAAC: Phase Three prevalence maps and time trend analyses

Statistical methods used in ISAAC: Phase Three risk factor analyses

Statistical methods used in ISAAC: Centre-level differences adjusted for individual-level risk factors

Statistical methods used in ISAAC: Ecological analyses at the centre level