Data Management

(Tadd Clayton)

ISAAC is a unique international study which has been extremely fortunate to receive enthusiastic support from many researchers (and their research teams) throughout the world. Use of the same research design and tools (e.g. questionnaires) by all participating centres has been essential so that the results from the centres can be compared and any differences can be considered to reflect true differences in prevalence, rather than be attributed to differences in methodology. The ISAAC Phase One Manual, Phase One Coding and Data Transfer Manual, Phase Three Manual and Phase Three Environmental Questionnaire Coding and Data Transfer Document provided detailed instructions regarding how to carry out an ISAAC study, and how to prepare the data for transfer to the ISAAC International Data Centre (IIDC).
Read more

However, as ISAAC Phase One and Phase Three data has been contributed by many researchers who naturally have very varied training and research experience, it was important for the IIDC to carry out quality assurance checks on the data and assess how well each centre had followed the ISAAC protocol. My role at the IIDC was to receive the Phase One and Phase Three data from the participating centres, carry out a range of quality assurance checks on the data and communicate with the researchers with the aim of achieving the best quality possible final data set for each centre. For most centres there was at least one revised version of the data and in some cases several revisions were necessary. The checks carried out on the data included checks for consistency of date of birth, age and date of interview, checks for invalid values, and checks for unexpected patterns of results.
Collapse

Checks for consistency

The ISAAC Phase One and Phase Three questionnaires included questions about the date the questionnaire was completed (date of interview), date of birth and current age of the child or adolescent. It was thus possible to generate a calculated age (using the date of birth and date of interview) and compare this with the age provided by the parent or adolescent. In many cases where there were differences between the age and the calculated age, the researchers were able to consult school records to identify appropriate corrections.

Checks for invalid values

The Phase One Coding and Data Transfer Manual (hyperlink), Phase Three Manual (hyperlink) and Phase Three Environmental Questionnaire Coding and Data Transfer Document (hyperlink) provide detailed information concerning what codes or values are valid for each question. In cases where unexpected values were present, the researcher was asked to review the original questionnaire and identify the appropriate correction.

Checks for unexpected patterns

The ISAAC Phase One and Phase Three core questionnaires use a “stem” and “branch” structure where it is intended that the participant would only answer some questions if they provided a positive response to the previous questions. An example of this is the first two questions of the asthma symptoms questionnaire:

  1. Has your child / Have you ever had wheezing or whistling in the chest at any time in the past? Yes/No

    IF YOU HAVE ANSWERED “NO” PLEASE SKIP TO QUESTION 6

  2. Has your child / Have you had wheezing or whistling in the chest in the past 12 months? Yes/No

If all parents or adolescents correctly followed the instruction between these questions, there would be no respondents who answered “No” for question 1 and “Yes” for question 2. After all, how can someone have wheezing in the last 12 months but not have wheezing at any time in their life? However, in practice we found that the data sets from nearly all centres have some children or adolescents where there are responses which appear to be inconsistent. For example, in Auckland, New Zealand for Phase Three there are approximately 5% of children and 10% of adolescents who have at least one case of responses which appear to be inconsistent.
Read more

Given that some parents and adolescents will provide responses which appear to be inconsistent, we had to decide what (if anything) to do about these cases. It is very easy to manipulate data using modern statistical analysis software and we could easily recode the data so that question 2 is set to missing. In other words, we would assume that the answer to question 1 (“No”) is correct and that the response to question 2 should be blank as suggested by the instruction between the questions. However, in this example there are two questions and it is easily possible (perhaps equally as likely) that it is question 2 which is correct and question 1 which is incorrect. The ISAAC Steering Committee decided that there is not enough information to accurately decide which response is incorrect and that to recode the data based on the assumption that the first response is correct would run the risk of introducing bias into the data. The data was therefore left unchanged and cases where the responses appear to be inconsistent were accepted. This did not cause any problems for ISAAC analyses where the focus was on the prevalence of individual symptoms and the common denominator for prevalence calculations was the total number of participants.

However some of the data sent to the IIDC did not include any cases of response which appeared to be inconsistent. This suggested that the data may have been modified to remove the inconsistencies between responses before it was sent to the IIDC. For these centres we asked the researcher whether the data had been modified and whether it was possible for them to submit a copy of the data without the modification. Some centres were able to provide unmodified data while others were not, usually because the changes had been made during the data entry process. Several centres were identified as having modified the data to remove apparent inconsistencies in the data tables for Phase One and Phase Three publications.
Collapse

Transfer of data

The IIDC has been receiving data files and other electronic files from researchers and colleagues since 1993 and there have been many changes in technology during that time. Most Phase One data files were sent to the IIDC by post on 3½ inch diskette although a few centres did use CD-ROMs and some even used 5¼ inch floppy disks. Email was not in common use at the time and it was very rare to receive data files as attachments to messages. By the time of Phase Three, email was available for nearly all of the researchers and it was much more common for to receive data by email although I did still receive some data by post on CD-ROM.
Read more

The Phase One Coding and Data Transfer Manual, Phase Three Manual and Phase Three Environmental Questionnaire Coding and Data Transfer Document provided very clear, detailed instructions regarding how ISAAC data should be prepared for transfer to the IIDC. The time and effort put into these documents proved to be very worthwhile and I would particularly like to acknowledge the efforts of Alistair Stewart who lead the development of the Phase One Coding and Data Transfer Manual which was the model for the subsequent documents. Nearly all the data files received by the IIDC used the structure and codes we specified. In only a few cases was it necessary to ask the researcher to send a further copy of the data, generally because there had been some damage to the files in transit. While most data used the expected structure there were occasionally some challenges in reading the data. Perhaps the most interesting challenge I encountered was to identify a way to convert dates from the Persian calendar to the Gregorian calendar.

For Phase One, most data was sent to the IIDC as text format data files as specified in the Coding and Data Transfer Manual although a few researchers did choose to use other formats such as Excel spreadsheet files or DBASE database files. For Phase Three, Excel files were much more common, and other formats such as SPSS and Access were also used on occasion. We were fortunate that the software resources available to us through The University of Auckland were sufficient to read all file formats we received throughout Phase One and Phase Three.
Collapse