![]() |
![]() |
|
This release contains data from Years 0-10 of the CARDIA Study. In their original form, these data were distributed as Version 1.4, 2.2, 3.1, 4.1, and 5.1. Links to the general notes concerning each of the original data releases are available at the bottom of this page. Datasets are organized by data collection form number. Form numbers are assigned based on content and are used throughout the study, but the format of the form and the items contained therein may change over time. For example, Form 2 is the Blood Pressure Form but the format of the form changed from Year 0 to Year 2. Naming Conventions All dataset names consist of 5-8 characters. The first character indicates the exam during which the data were collected (A=1, B=2, C=3, D=4, E=5). The second character indicates the version number. All public release versions are indicated by a letter; for this version, that is 'A'. For data collected via a form designed by the study, the remaining characters indicate the form number (Fxx, where xx=form number). For data from a laboratory or reading center, the remaining characters of the dataset name are descriptive of the contents. Variable names consist of 5-8 characters. As with datasets, the first character indicates the exam during which the data were collected. Characters 2-3 indicate form number or laboratory from which the data were obtained. The remaining characters are descriptive of the contents. Variables which were collected in more than one exam are named identically except for the first character. Documentation For each dataset, 3 types of documentation are available. First, for data collected via forms, the form is available as a PDF file. For each item on a form, the corresponding SAS variable name in the distributed data sets is listed. These variable names did not appear on the form during data collection. Second, new documentation regarding the publicly released version is available as a PDF file. Notes concerning revisions to the dataset made for the public release are contained therein, as well as a PROC CONTENTS listing and the SAS program used to generate the dataset. Finally, the original documentation for the internal study version is available as a PDF file. Notes concerning computed variables, problems in the dataset, and information for longitudinal analyses are contained therein, as well as information about data edits, original contents, original SAS program, and the original data dictionary. Medical history and medication follow-up forms - For several medical conditions, detailed information is desired. An affirmative answer to items on the medical history questionnaire (Form 8) prompts completion of these follow-up forms. The follow-up forms are all designated as Form 9, with a descriptive subtitle. Please refer to the forms for the specific skip patterns followed. Revisions for Public Release For public release, some modifications were made to protect the anonymity of the participants. To maintain consistency, rules were developed for variable transformations based on variable type (i.e., dichotomous, continuous). Rules were applied within the 4 race-gender cells. Table 1 provides the numbers of participants in each of these cells across all Field Center populations. Table 1. Frequency and percent of participants by race and gender at
each exam, CARDIA, November 1999
Note. Strata are noted with a two-character abbreviation. The first character represents Race (Black or White) and the second character represents Gender (Male or Female). Transformations to all datasets. Some revisions were made to all datasets in the current release. The original CARDIA ID, which had information about Field Center embedded in it, was replaced with a randomly generated ID for each individual. The new variable, PID, can be used to merge datasets such that individual particpants' data are correctly matched. In addition to this change, the variable CENTER was deleted. Finally, 2 participants who had sex change operations during the course of the study were deleted from all datasets as the reliability of some data, particularly chemistries, may not be reliable due to hormonal changes. Transformation Rules. The following rules were applied to individual variables: a. Variables with Inherent Ability to Identify Individuals Those variables that are judged to inherently identify individuals are not included in the data set. Examples are variables containing information regarding name and birth date. b. Variables with Inherent Ability to Identify Field Center Variables which inherently contain information about field center are either be deleted or recoded. Examples are variables such as technician ID or machine ID. c. Date Variables All dates (except birthdate, which will be deleted) are recoded to number
of months relative to the Year 0 (Baseline) Examination date. This rule
is intended to retain the chronological nature of events while obscuring
the actual calendar time. An illustration is provided below:
d. Dichotomous and Polychotomous Variables For dichotomous and polychotomous variables identified as needing modification, we include without alteration variables for which there are 20 or more participants represented in each response category within a given race-gender strata. If one or more categories have fewer than 20 responses, we either combined categories so that none are left with fewer than 20 responses or, failing that, set all responses in that design cell to missing. e. Continuous Variables For those continuous variables needing modification, we identified participants with the 20 highest values and those with the 20 lowest values within each of the 4 race-gender cells. The values for these participants were changed to the threshold value used to identify the group. Thus, the 20 with the highest values all have their data changed so that the value is set at the value for the 21st value from the top. Similar transformation was done for the low values. For cells with less than 40 non-missing values, all values were set to missing. f. Character Variables Some variables have been deleted as they contain confidential information such as initials, place of birth, and reason for hospitalization. g. Time Variables No transformations were made as times are recorded only as values on the 24-hour clock and contain no information about date. h. Special Coding Some variables received special coding according to specific rules that do not fall into any of the previous categories. One such example is age. At baseline, values less than 18 were recoded to 18 and values greater than 30 were recoded to 30. Another example is number of children or number of siblings. For these types of variables, transformations were done for non-0 values. Information for either pregnancies or children of more than four was recoded to four. Details of these special coding transformations are contained in the new documentation section for each dataset. Datasets Judged Too Sensitive to Distribute Some datasets from the original CARDIA release were not included in this
release. The following details these datasets and the reason for withholding
distribution.
General Notes from Original Documentation Year 0: General Notes V1.4 - (D10226.PDF) Year 2: General Notes V2.2 - (D10293.PDF) Year 5: General Notes V3.1 - (D10310.PDF) Year 7: General Notes V4.1 - (D10398.PDF) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||
|
|
||
|
©
Division of Preventive Medicine
|
||