Search
Overview of CARDIA
CARDIA Publications
CARDIA Data Access
CARDIA Exam Materials
Are you a Participant
Contact Information
Oakland, CA
CARDIA Documentation NHLBI Policies

General Data Notes

CARDIA Public Release, Version A.2

This release contains data from Years 0-10 of the CARDIA Study. In their original form, these data were distributed as Version 1.4, 2.2, 3.1, 4.1, and 5.1. Links to the general notes concerning each of the original data releases are available at the bottom of this page. Datasets are organized by data collection form number. Form numbers are assigned based on content and are used throughout the study, but the format of the form and the items contained therein may change over time. For example, Form 2 is the Blood Pressure Form but the format of the form changed from Year 0 to Year 2.

Naming Conventions

All dataset names consist of 5-8 characters. The first character indicates the exam during which the data were collected (A=1, B=2, C=3, D=4, E=5). The second character indicates the version number. All public release versions are indicated by a letter; for this version, that is 'A'. For data collected via a form designed by the study, the remaining characters indicate the form number (Fxx, where xx=form number). For data from a laboratory or reading center, the remaining characters of the dataset name are descriptive of the contents.

Variable names consist of 5-8 characters. As with datasets, the first character indicates the exam during which the data were collected. Characters 2-3 indicate form number or laboratory from which the data were obtained. The remaining characters are descriptive of the contents. Variables which were collected in more than one exam are named identically except for the first character.

Documentation

For each dataset, 3 types of documentation are available. First, for data collected via forms, the form is available as a PDF file. For each item on a form, the corresponding SAS variable name in the distributed data sets is listed. These variable names did not appear on the form during data collection.

Second, new documentation regarding the publicly released version is available as a PDF file. Notes concerning revisions to the dataset made for the public release are contained therein, as well as a PROC CONTENTS listing and the SAS program used to generate the dataset.

Finally, the original documentation for the internal study version is available as a PDF file. Notes concerning computed variables, problems in the dataset, and information for longitudinal analyses are contained therein, as well as information about data edits, original contents, original SAS program, and the original data dictionary.

Medical history and medication follow-up forms - For several medical conditions, detailed information is desired. An affirmative answer to items on the medical history questionnaire (Form 8) prompts completion of these follow-up forms. The follow-up forms are all designated as Form 9, with a descriptive subtitle. Please refer to the forms for the specific skip patterns followed.

Revisions for Public Release

For public release, some modifications were made to protect the anonymity of the participants. To maintain consistency, rules were developed for variable transformations based on variable type (i.e., dichotomous, continuous). Rules were applied within the 4 race-gender cells. Table 1 provides the numbers of participants in each of these cells across all Field Center populations.

Table 1. Frequency and percent of participants by race and gender at each exam, CARDIA, November 1999
 

 

Year 0

Year 2

Year 5

Year 7

Year 10
Strata

N

%

N

%

N

%

N

%

N

%
BF

1480

28.9

1298

28.1

1214

27.9

1143

28.0

1120

28.4

BM

1157

22.6

988

21.4

905

20.8

831

20.3

806

20.4

WF

1307

25.6

1234

26.7

1178

27.1

1105

27.0

1072

27.1

WM

1171

22.9

1102

23.8

1054

24.2

1006

24.6

950

24.1

Gender change

2

0.0

2

0.0

1

0.0

1

0.0

2

0.1

Total

5115

100.0

4324

100.0

4352

100.0

4086

100.0

3950

100.0

Note. Strata are noted with a two-character abbreviation. The first character represents Race (Black or White) and the second character represents Gender (Male or Female).

Transformations to all datasets. Some revisions were made to all datasets in the current release. The original CARDIA ID, which had information about Field Center embedded in it, was replaced with a randomly generated ID for each individual. The new variable, PID, can be used to merge datasets such that individual particpants' data are correctly matched. In addition to this change, the variable CENTER was deleted. Finally, 2 participants who had sex change operations during the course of the study were deleted from all datasets as the reliability of some data, particularly chemistries, may not be reliable due to hormonal changes.

Transformation Rules. The following rules were applied to individual variables:

a. Variables with Inherent Ability to Identify Individuals

Those variables that are judged to inherently identify individuals are not included in the data set. Examples are variables containing information regarding name and birth date.

b. Variables with Inherent Ability to Identify Field Center

Variables which inherently contain information about field center are either be deleted or recoded. Examples are variables such as technician ID or machine ID.

c. Date Variables

All dates (except birthdate, which will be deleted) are recoded to number of months relative to the Year 0 (Baseline) Examination date. This rule is intended to retain the chronological nature of events while obscuring the actual calendar time. An illustration is provided below:
 

Variable Date Months Since Baseline Exam
Baseline exam September 21, 1985 0
Pregnancy delivery October 31, 1983 -23
Hospitalization July 5, 1996 106

d. Dichotomous and Polychotomous Variables

For dichotomous and polychotomous variables identified as needing modification, we include without alteration variables for which there are 20 or more participants represented in each response category within a given race-gender strata. If one or more categories have fewer than 20 responses, we either combined categories so that none are left with fewer than 20 responses or, failing that, set all responses in that design cell to missing.

e. Continuous Variables

For those continuous variables needing modification, we identified participants with the 20 highest values and those with the 20 lowest values within each of the 4 race-gender cells. The values for these participants were changed to the threshold value used to identify the group. Thus, the 20 with the highest values all have their data changed so that the value is set at the value for the 21st value from the top. Similar transformation was done for the low values. For cells with less than 40 non-missing values, all values were set to missing.

f. Character Variables

Some variables have been deleted as they contain confidential information such as initials, place of birth, and reason for hospitalization.

g. Time Variables

No transformations were made as times are recorded only as values on the 24-hour clock and contain no information about date.

h. Special Coding

Some variables received special coding according to specific rules that do not fall into any of the previous categories. One such example is age. At baseline, values less than 18 were recoded to 18 and values greater than 30 were recoded to 30. Another example is number of children or number of siblings. For these types of variables, transformations were done for non-0 values. Information for either pregnancies or children of more than four was recoded to four. Details of these special coding transformations are contained in the new documentation section for each dataset.

Datasets Judged Too Sensitive to Distribute

Some datasets from the original CARDIA release were not included in this release. The following details these datasets and the reason for withholding distribution.

  • Illicit drug use (all years) - judged too sensitive to distribute
  • Year 2 medical history follow-up questions for hysterectomy, liver disease, and vasectomy - too few records
  • Year 2 lipids and lipoproteins - judged to be unreliable
  • Year 7 GXT - judged unreliable at one center
  • Year 10 return blood draw and medical history follow-up questions for angiograms and MRI scans - too few records
  • Year 10 EBCT pilot - reliability still undetermined

General Notes from Original Documentation

Year 0: General Notes V1.4 - (D10226.PDF)

Year 2: General Notes V2.2 - (D10293.PDF)

Year 5: General Notes V3.1 - (D10310.PDF)

Year 7: General Notes V4.1 - (D10398.PDF)

Year 10: General Notes V5.1 - (D10364.PDF) 

 

~~~~ CARDIA || UAB || Contact Us ~~~~
 
 
 
© Division of Preventive Medicine