What is IHIS?
What is in the future for IHIS?
How do IHIS files differ from the NHIS public use files already in distribution?
Getting started
Where should a new user start?
How do I get access to IHIS?
Basic concepts
What are microdata?
What is "integration"? What are "integrated variables"?
Why is a variable from the NHIS that I have worked with before not included in IHIS?
Can I combine IHIS data with other NHIS variables needed for my research?
How is a record uniquely identified?
What does "universe" mean in the variable descriptions?
Data Limitations and Cautions to Users
What are the major limitations of the data?
Are there aspects of IHIS data about which to be particularly careful?
Is help available if I encounter problems using IHIS?
Getting data
How do I obtain data?
What is the data format?
What is the best way to use the extract system?
How long does a data extract take?
Is there a preferred statistical package for using IHIS?
Why can't I open the data file?
Can I get the original data?
Using the variables page
Variables page menu
Variables page details
Using the data extract system
Extract step 1: login
Extract step 2: select samples
Extract step 3: select variables - review
Extract step 3: select variables - add
Extract step 4: submit
General information about the project
What is IHIS? [top]
The Integrated Health Interview Series (IHIS) is a harmonized set of data and documentation based on material originally included in the public use files of the U.S. National Health Interview Survey (NHIS) and distributed for free over the Internet. IHIS variables are given consistent codes and have been thoroughly documented to facilitate cross-temporal comparisons. The "integration" process is described more fully below.
IHIS is not a collection of compiled statistics; it is composed of microdata. Each record represents a person, with all characteristics of that person numerically coded. These person records are organized into households, making it possible to study the characteristics of people in the context of their families or other co-residents. Because the data refer to individuals and not tables, researchers must use a statistical package to analyze the records in the IHIS database. A data extraction system enables users to select only the survey years and variables they require.
What is in the future for IHIS? [top]
The IHIS Project is funded by a grant from the National Institute of Child Health and Human Development (NICHD). We plan annual data releases that will add additional variables, more variable groups, and new website features.
We hope to continue the project beyond our five-year funding period, but we will have to secure further funding as our current grant expires. To be successful, we need to demonstrate the existence of a large body of works based on IHIS data or documentation. If you use IHIS to create educational materials, satisfy a course requirement, or prepare a report, presentation, publication, or thesis, please tell us about it, by adding to our bibliography site.
How do IHIS files differ from the NHIS public use files already in distribution? [top]
Public use files for the NHIS are the basis for IHIS data. These original public use files also include supplements and variables not yet included in the IHIS public database. Researchers can access the original files through the IHIS website. Directions on how to link variables from these original NHIS public use files to IHIS data extracts can be found in the user note on linking.
The IHIS Project recodes the original public use data to increase consistency over time. IHIS also provides on-line documentation that describes variable meaning and addresses comparability issues, along with providing information about years available, universes, codes and frequencies, question wording, appropriate weights, and source files for each included variable. By using the IHIS data extraction system, analysts can select the years and variables they are interested in and work with a single dataset, without having to link or combine multiple files. For the most part, IHIS does not use the same variable names included in the original public use data; variables have been renamed to increase consistency over time and within subject categories.
Getting started
Where should a new user start? [top]
The natural starting point is the "variables" page linked on the left navigation bar. The variables page is the primary tool for exploring the contents of IHIS. By default, the variables page displays one variable group at a time for all years in the data series. You can change the view option to show all groups simultaneously, but the page can get very large and slow to load. You can also filter the information at any point to include only the years of interest to you ("Select samples"). More detailed information on using the variable menu is available.
When you select samples, the page will display only variables present in those survey years. An "x" indicates the availability of a variable for a given year in the current IHIS database.
Clicking on a variable name brings up the variable description. At the top of the variable description is information on the years available, the universe (i.e., who was asked the question), the weight to use, and the source file in the original NHIS public use files. The body of the variable description contains a summary description of the variable's meaning and a discussion of comparability issues over time. By clicking on hyperlinks within the variable description, you can access similar information for closely related variables. Via the "codes" and "survey text" links in a variable description, you can determine the categories available in each year and the question wording on the survey form.
The variables page also has direct links to the codes page for each variable (the "codes" to the right of the variable labels). The codes page shows the coding structure and labels for variables, and the availability of categories across years. These categories can suggest the types of research possible. Via the codes page, users can also view unweighted frequencies for each response category in each year.
If you have a specific substantive interest, such as "asthma," you may wish to use the "Search Variables" feature on the variables page. Entering a word (such as "asthma") and hitting the "search" button will bring up a list of all variables that include that term in the variable name, label, and, if you wish, variable descriptions and categories. Thus, for example, searching on "asthma" brings up a list of variables appearing not only in the "asthma" condition group, but also in other variable groups, such as causes of activity limitation and conditions treated with alternative medicine modalities.
Throughout the variable documentation system there are checkboxes and buttons to "Include in Extract." For example, you can choose to include a variable in an extract while looking at the variables page or while reviewing an individual variable's description. Any variables you identify in this way will be pre-selected for you when you enter the data extract system. Your selections only last for the current web session.
Before beginning analysis of IHIS data, users are advised to review the material in the "user notes" section. These user notes discuss such issues as variance estimation, sample design, the use of weights, and how to link variables from the original NHIS public use files with IHIS data. The user notes also provide counts of the number of person and household records in the IHIS database for each year.
If you are already registered to use IHIS, you can click on "Create an Extract" and use the data access system. Otherwise, you can identify yourself as a "guest" and explore the steps involved in making a data extract without actually producing an extract. Detailed instructions for using the extraction system are below.
How do I get access to IHIS? [top]
Access to the documentation is freely available without restriction. To get access to the data, users must agree to specified conditions of responsible use. For purposes of internal recordkeeping, and to provide the IHIS staff with a clear sense of the user constituency (to improve outreach and better serve users), registration also requires users to provide some information about themselves, such as their discipline, academic or non-academic status, and institutional affiliation. To register for access to the data, go to the IHIS registration webpage.
Basic concepts
What are microdata? [top]
Microdata are composed of individual records containing information collected on persons and households. The unit of observation is the individual. The responses of each person to the different survey questions are recorded in separate variables.
Microdata stand in contrast to more familiar "summary" or "aggregate" data. Aggregate data are compiled statistics, such as a table of marital status by sex for some locality. There are no such tabular or summary statistics in the IHIS data.
Microdata are flexible. One need not depend on published statistics from a survey that compiled the data in a certain way, if at all. Users can generate their own statistics from the data in any manner desired, including performing individual-level multivariate analyses.
See an image of IHIS data here. In this example, the data are presented in hierarchical format, which means that a household record is followed by individual records for each person in that household. If users chose the default option of rectangularized data, the household record's information appears at the beginning of the record for each person in that household, and there are no separate household records.
What is "integration"? What are "integrated variables"? [top]
Integration is the process of making variables more comparable across survey years. For example, every year the NHIS collected data on completed schooling, but the presentation of these data changed over time. For 1969-81, educational data are reported as years of completed schooling, grouped into intervals; for 1982-96, these data are reported as years of completed schooling in single years; and for 1997 forward, these data are reported as degrees attained for those with more than a high school education. IHIS includes several education-related variables to preserve the detail for every period, but it also provides a bridging variable, EDUCREC2, that recodes the educational data into a single, consistent coding scheme for 1969 forward.
Because some survey years provide more detail for a given variable than is the case in other years, a coding scheme that reduced variables down to the lowest common denominator across all survey years would inevitably lose important information. As a result, many IHIS integrated variables use composite coding schemes. The first one or two digits of the code provide information available across all samples. The next one or two digits provide additional information available in a broad subset of years. Finally, trailing digits provide detail only rarely available. All meaningful detail in the original NHIS public use files is therefore available to researchers, if they need it, but they can confine their attention to the less detailed information if they wish. For example, the first digit of the "employment status" variable (EMPSTAT) groups the population into the broad categories generally used by the U.S. Bureau of Labor Statistics (i.e., working, with a job but not at work, unemployed, not in the labor force, and, for children, not in universe), while subsequent digits provide additional detail (e.g., currently employed but was not at work and was looking for work during the previous week) only available prior to 1997.
A second feature of integration in IHIS is combining, into a single variable, material covering comparable substantive ground but appearing in different types of files in the original NHIS public use data. For example, IHIS combines information on whether an individual had a usual place for medical care into a single variable (USUALPL). This information appeared for many years under different variable names and in several different types of files (including the Health Promotion and Disease Prevention, Cancer Control, Child Health, and Access to Care supplements and the core Sample Adult and Sample Child files) in the original NHIS public use files.
Most of the integration work is carried out using translation tables. This example of a translation table covers selected years of data for DVINT, interval since last doctor visit. In every year, the NHIS included this question, but the original NHIS public use files coded responses into different intervals in different years. Moreover, the variable was included in the person files for 1969-96 and in the sample adult and sample child files for 1997 forward. IHIS combines these data into a single variable and uses a composite coding scheme to facilitate comparisons across years (using the first digit) without losing the detail present in every year (using the second digit).
A third, key component of integration is the variable documentation, which highlights important comparability issues. Particularly important are comparability problems that are not evident from the coding structure, such as changes in the survey question wording and shifts in the variable universe. The IHIS Project staff must exercise their judgment in composing this documentation, because there is no formula for it. So that users need not depend totally on us, IHIS documentation also includes web-accessible copies of the survey forms, with hypertext links between each variable description and the relevant section of the questionnaires.
Why is a variable from the NHIS that I have worked with before not included in IHIS? [top]
As of August 2010, IHIS includes over 5,000 integrated variables covering the period 1969-2009. Our planned data release for November 2010 will increase the number of IHIS variables to approximately 7,000. While this figure is substantial, less than half the source variables included in the NHIS public use files for 1963-2009 have been made public through IHIS. During the remaining four years of the current IHIS grant period, we will continue to add thousands more variables to our public database every year.
Experienced analysts of NHIS data who do not find a familiar variable included in IHIS can instead make use of the original public use data. In addition, as explained immediately below, users can add variables they need from the NHIS public use files to an IHIS data extract.
In a few cases, the NHIS survey contained questions about topics not included in the original NHIS public use files. The survey responses may never have been processed, or the responses may be included only as part of a composite recoded variable, or the variable may have been left out of the public use files due to confidentiality concerns. Because the NHIS public use files are the raw material used to create the IHIS database, variables missing from the public use files are missing from IHIS.
Can I combine IHIS data with other NHIS variables needed for my research? [top]
Interested users can combine variables from IHIS and NHIS public use files. Variables from the original NHIS public use files (that are not yet in the IHIS system) can now be linked to an IHIS data extract. IHIS has created linking keys from the series of original NHIS variables that are used to uniquely identify households (NHISHID) or persons (NHISPID). More information, including an overview of NHIS unique identifiers, IHIS linking keys, and general guidance on how to link variables from NHIS public use files to and IHIS data extract, can be found in a user note on linking. In addition to the general guidance offered by this user note, sample Stata and SAS programs are provided to facilitate the linking process.
How is a record uniquely identified? [top]
Three variables constitute a unique identifier for each person record in IHIS: YEARP, SERIALP, and PERNUM (survey year, household identifier, and person number within the household). The combination of YEAR and SERIAL (which have the same values as YEARP and SERIALP on the person record) constitute the unique household identifier on the household record. These are IHIS constructed variables and will not be found in the NHIS public use files.
Individual households and persons can also be uniquely identified in a manner consistent with the NHIS public use files. The combination of YEAR and NHISHID will uniquely identify households, while the combination of YEAR and NHISPID will uniquely identify persons. NHISHID and NHISPID are constructed from data elements in the original NHIS public use files to produce the unique identifiers as defined by NHIS. These unique identifiers can be used as linking keys to merge variables from the NHIS public use files to IHIS data. More information about linking NHIS variables to IHIS data can be found in the user note on linking.
What does "universe" mean in the variable descriptions? [top]
The universe is the population at risk of having a response for the variable in question. In most cases, these are the households or persons to whom the survey question was asked, as reflected on the survey questionnaire. For example, employment variables do not include children, since the NHIS does not ask children about employment.
Cases that are outside of the universe for a variable are labeled "NIU" on the codes page. A change in a variable's universe across years is a common data comparability issue.
In some cases, IHIS imposes a different variable universe than the one found in the original NHIS public use data. Usually this is done to distinguish cases with meaningful zeros (e.g., adults who reported having individual incomes of zero dollars) from cases where "NIU" was also originally coded as zero (e.g., for children, who were not asked about their individual incomes). The top of each variable description specifies the universe for that variable.
Data Limitations and Cautions to Users
What are the major limitations of the data? [top]
The data consist entirely of records for individual persons and households from the public use files of the NHIS. IHIS does not deliver aggregate or published statistics from the survey. Researchers interested in aggregate data will find it on the National Center for Health Statistics website.
The number of persons and households in the survey varies from year to year, but, on average, the survey covers about 100,000 persons in about 45,000 households each year. Exact figures on the number of households and persons included in IHIS in each year are available in the user note on sample sizes. While the NHIS ranks as one of the largest surveys conducted annually by the U.S. government, the samples may not supply enough cases to reliably study some subpopulations. To achieve adequate sample sizes for some subgroups, researchers may wish to combine data from two or more survey years.
Because the NHIS data are public-use, measures have been taken to assure confidentiality. Names and other identifying information are suppressed. You cannot find specific individuals in the IHIS data or use these data for genealogical research. Moreover, because the NHIS uses population samples to generate the data, there is no guarantee that any given individual will be in the dataset. Finally, the registration form requires potential users to commit to using the data responsibly, including utilizing the data for statistical reporting and analysis only and making no effort to identify particular individuals in the data.
Geographic detail in the NHIS public use files and thus in IHIS is limited to the identification of census regions and a few large metropolitan statistical areas. Researchers can access more geographic detail and add it to an IHIS data extract by working with the staff of the NCHS Research Data Center (RDC). If your research proposal is approved, you can access restricted data (including geographic identifiers) through on-site analysis at an NCHS or Census Restricted Data Center, via remote access, or with the paid assistance of NCHS RDC staff.
Are there aspects of IHIS data about which to be particularly careful? [top]
IHIS is an integrated dataset based on the NHIS public use files. However, IHIS coding schemes follow different conventions than NHIS in many instances. For example, NHIS uses the convention of 1 = Yes and 2 = No, while IHIS uses the convention 1 = No and 2 = Yes. To pick another example, blanks in the original NHIS public use data files are converted to numeric values (usually beginning with a 0 or a 9, to indicate "not in universe" cases) in IHIS. Moreover, to achieve comparably coded variables over time, IHIS has recoded most variables from the original NHIS coding schemes. Users are strongly urged to review the IHIS documentation carefully and to not assume that variable values will be coded the same in IHIS as they were in the NHIS files.
The NHIS uses a complex sampling scheme, so all IHIS samples are weighted. Put another way, individuals in the data do not all represent an identical number of persons in the population in a given year. It is therefore necessary to use the appropriate weight variables when analyzing these samples. A user note on sampling weights discusses the proper use of weights with IHIS data. In addition, the "Weights" section at the top of each variable description specifies the suggested IHIS weight to use with that variable, by year.
The NHIS does not contain the full universe of persons in the U.S. population. Rather, the survey samples the civilian non-institutionalized population and thus excludes such persons as residents of nursing homes and members of the armed forces living in barracks. A user note on sample design contains information about the NHIS sampling scheme and changes in NHIS sampling over time. Changes in the sampling methodology have implications for variance estimation over time, and a user note on variance estimation discusses appropriate practices using IHIS data.
It is important to examine the documentation for the variables you are using. The codes and labels for variable categories do not tell the whole story. Two features of the variable documentation merit special attention. First, examine the universe for a variable (the population at risk of answering the question), which can differ subtly or markedly across years. Second, read the comparability discussions for the variables in which you are interested. Users intending to use race or Hispanic origin as a variable across multiple survey years may find useful not only the IHIS variable descriptions but also the NHIS Race and Hispanic Origin Information page on the National Center for Health Statistics website.
Reproductions of the relevant portions of the survey form are available within variable descriptions (via the "survey text" link). For PDF reproductions of the original NHIS survey forms and retyped versions of the same material, click on the Surveys link in the left sidebar of the IHIS home page.
By default, the extract system rectangularizes the data: it puts the household information on the person records and drops the separate household record. This can distort analyses at the household level. The number of observations will be inflated to the number of person records. To get the proper number of household observations, either select the first person in each household or select the "hierarchical" box in the extract system. The rectangularizing feature also drops any non-interviewed households. Despite these complications, the great majority of researchers prefer the rectangularized format, which is why it is the default output of our system.
Is help available if I encounter problems using IHIS? [top]
Users who encounter problems with the IHIS extract system, data, or documentation can e-mail the IHIS User Support team for assistance. The IHIS Project staff also welcomes feedback from users who encounter errors, inconsistencies, or lack of clarity in the data and documentation. Users who contact us with information about a legitimate and substantial error in the data or documentation will be sent a complimentary IHIS mug.
Getting data
How do I obtain data? [top]
All IHIS data are delivered through our data extraction system. Users select the variables and years they are interested in, and the system creates a custom-made extract containing only this information. Detailed instructions for using the data extraction system are available below.
Data are generated on our server. The system sends out an email message to the user when the extract is completed. Users must download their extracts and analyze them on their local machines. To access the extract system, go to the IHIS login page. If you are not yet registered, you can log in as "guest" and examine the interface. To register to use the data, go to the IHIS registration page.
What is the data format? [top]
IHIS produces fixed-column ASCII data. With the exception of the "P's" and "H's" that identify record type (distinguishing between person and household records), IHIS data are entirely numeric. By default, the extraction system rectangularizes the data, putting household information on the person records. With rectangularization, there are no separate household records in the data extract. No information is lost, and most researchers prefer this format. The default can, however, be overridden to yield hierarchical data consisting of household records followed by the person records of household members.
In addition to the ASCII data file, the system creates a statistical package syntax file to accompany each extract. The syntax file is designed to read in the ASCII data while applying appropriate variable and value labels. IHIS supports requests for Stata, SAS, and SPSS statistical syntax. You must download the syntax file with the extract, or you will be unable to read the data. The syntax file requires minor editing to identify the location of the data file on your local computer. Directions regarding these minor edits are included in the guide to using the data extract system.
A codebook file is also created with each extract. This codebook file records the characteristics of your extract and should be downloaded for recordkeeping.
All data files are created in gzip compressed format. You must decompress the file to analyze it. Most data decompression utilities will handle the files. Among the available free software for decompressing files are WinGzip (for Windows) and MacGZIP (for Macs).
What is the best way to use the extract system? [top]
The data extraction system is a flexible tool. There is no need to download variables or survey years you do not expect to use for your current analysis. The system records every extract you make. You can reload and modify an old extract, dropping or adding variables or survey years.
The data extract system automatically supplies the variables YEAR (survey year), HHWEIGHT (household weight), PERWEIGHT (person weight), PSU (primary sampling unit for variance estimation), and STRATA (stratum for variance estimation) for all extracts created. SAMPWEIGHT is also automatically included in extracts that incorporate data from 1997 forward. YEAR allows users to distinguish between results from different survey years. Proper use of HHWEIGHT or PERWEIGHT, in combination with PSU and STRATA, accommodates the complex sampling design of the NHIS. SAMPWEIGHT should be used with the many variables from 1997 forward that are based on questions asked only of sample adults and sample children. Use of weights with IHIS data is discussed in the user note on sampling weights.
When users override the default choice of rectangularized files and select hierarchical data, the extract system automatically supplies the variables SERIAL and SERIALP (the household and person record versions of the household identification number, with identical values for the two variables), YEARP (survey year, as specified on the person record), and PERNUM (person number within the household).
How long does a data extract take? [top]
The time needed to make an extract differs, depending on the number and size of samples requested and the load on our server. Creating an extract generally takes only a few minutes. The system sends an email upon completion of the extract, so there is no need to stay active on the IHIS site during the creation of the extract.
Is there a preferred statistical package for using IHIS? [top]
IHIS supports Stata, SAS, and SPSS. The system does not make data files in those formats, but it does generate syntax files with which to read in the ASCII data.
Why can't I open the data file? [top]
There are two likely explanations:
1. The data produced by the extract system are gzipped (the file has a .gz extension). You must use a data decompression utility before you can analyze the file.
2. You cannot open the data file directly with a statistical package. The file is a simple ASCII file, not a system file in the format of any statistical package. The extract system does, however, generate a syntax (set-up) file to read the ASCII file into your statistical package. You must download the syntax file along with the data file from our server, open the syntax file with your statistical package, and edit the path in the syntax file to point to the location of the data on your local computer. Now you are ready to read in the data.
Can I get the original data? [top]
As noted, the raw material for the IHIS database comes from the NHIS public use files provided by the National Center for Health Statistics. These original data are available on the NHIS data and documentation page. The National Center for Health Statistics' NHIS public use files also include variables and supplements not yet covered by IHIS.
Using the variables page
Variables page menu [top]
Use the left side of the menu to browse variables:
Household: household variables by group.
Person: person variables by group.
A-Z: integrated variables alphabetically.
Use the buttons on the right side of the menu to:
Select Samples: limit the display of variable information to selected years.
Options and Help: alter how the variable list is displayed or get help for this page.
Variables page details [top]
The Menu
The variables page allows you to browse integrated variables while limiting and controlling how the information is displayed.
The left side of the menu is for browsing the variables.
When you "Select samples," you limit the variable list to display only variables that are available in at least one of those years. But the effect of selecting years extends into other aspects of the documentation. For example, the codes pages will display codes and frequencies for only the years you selected, and the survey text linked to variable descriptions will only appear for the selected years. You can change your year selections at any point.
Selecting years is a good practice when exploring the IHIS, because the amount of information can be unwieldy. On the other hand, sometimes you need to see everything to determine what kinds of research are possible using the IHIS database.
The final menu button is "Options and Help," which is a drop-down with a number of choices. The first item on the list restores the default viewing options for the variables page. The last item invokes this help text. Each of the remaining items on the list is a toggle that provides an alternative view from the default viewing behavior. The following options are only visible when some variables are being displayed.
View one group / View all groups
Switch between viewing one variable group at a time and viewing all variable groups on one screen. Unless you have a limited number of samples selected, your browser may be slow to display all groups. The default view is one group at a time.
Show availability detail / Show availability summary
Switch between displaying the full year-specific availability matrix, and a view that only displays the total number of years that contain each variable. Both views only display or sum the years that the user has selected in "Select samples." The default view is the detailed availability information.
Reverse sample order
Switch between a view that displays the years available beginning with the most current year selected and moving back in time (reverse chronological order) and a view that is in chronological order. Both views only display the years that the user has selected in "Select samples." The default view is reverse chronological order.
View available variables / View all variables
Switch between a view that only displays variables present in one of your selected years, and a view that displays every variable, even if they are not available. The default view is to only display available variables.
The Variable List
As you browse the variables, they are displayed in a list containing a number of columns. The variable name links to the variable description, which includes detailed comparability discussions, universes, and survey text. The variable codes -- and their associated labels -- can be accessed directly using the "codes" links. The "type" column indicates if it is a person or household variable. In some contexts, like the alphabetic view, the two types are displayed together.
In the area to the right of the "codes" column, the default view displays a column for every year that the user chose in "Select samples." By default, all years are selected. The last two digits of the sample year identify each sample at the top of every column. If a variable is available in a given year, an "x" is printed in that column.
Each variable has a box on the far left in the column labeled "Include in Extract." Use these to identify variables you wish to include in a data extract. Any variable so identified will be pre-selected for you when you enter the data extract system. This feature is optional; you will have the opportunity to select variables again inside the extract system. Note that your variable selections only last for the current web session.
Using the data extract system
Extract step 1: login [top]
To use the extract system, you must first register on the IHIS registration page. Users are automatically registered when they agree to all conditions for use.
You can log in as "guest" using the password "guest" to examine the extract interface. The system will not actually create a data extract, however, until you have gone through the registration process.
On the first page of the extraction system, you are prompted for your e-mail address, which acts as your password and provides us with a means of contacting you and constructing a unique file name for your extract output.
Extract step 2: select samples [top]
In Step 2 of the extract procedure, you define some general characteristics of your desired extract.
Choose the preferred file structure for your extract: rectangular (all household information attached to respective household members) or hierarchical (household record followed by person records). The system defaults to rectangular format, which is the overwhelming choice of researchers.
At the bottom of the page, you select the particular year or combination of years you want to include in your extract.
If you selected years while browsing the variable documentation, those years are pre-selected in the extract system.
Extract step 3: select variables - review [top]
Step 3 -- Variable Selection -- begins and ends with the review screen. When you first encounter the screen, it lists the variables pre-selected by the extract system as well as any variables you selected while browsing the documentation. If you selected variables outside the extract system that are unavailable in the years you chose for this data extract, those variables will have their checkboxes grayed out. You can return to year selection and alter your choices now. The grayed-out variables will not persist to the next viewing of this page.
When you first reach the review screen, you will typically want to "add more variables," but you can continue to the next step, if you are satisfied with your selections. If you choose to add more variables, you must return again to this screen before proceeding to the next step.
Extract step 3: select variables - add [top]
The heart of the extract process is selecting variables. Use the left side of the menu bar to select integrated variables by group or by letter. Viewing options are available using the right-most button. Check the box to the left of the variable name to include it in your extract. The variable availability grid is on the right side of the integrated variable list. An "x" indicates that an integrated variable is available in a given year.
You can browse the variable codes and descriptions using the appropriate links. The variable documentation will open in a new tab or window.
More detailed information on using the variable menu is available.
When you are ready, click on "Review selected variables" to return to the review screen. From that screen you can proceed to the next step of the extract process or return to add more variables.
Extract step 4: submit [top]
Step 4 summarizes the extract you have defined. If it is acceptable, click the button to create your extract. There is a screen to write a note to describe the content of your extract for your future reference.
Extracts above a certain size are not allowed by the system. If your extract exceeds the maximum, you will receive a warning to this effect. You must reduce the size of your extract by selecting fewer years (Step 2) or fewer variables (Step 3).
A series of buttons near the bottom of the screen allow you to jump back to particular points in the extract process to modify your choices. Use these rather than backing through the system, or some of your selections may be lost. When you use these buttons, you will have to proceed forward through all of the steps of the extract process back to the summary screen. But as it loads each screen, the system will remember all of your selections that still apply.
When you submit an extract, there will be a delay ranging from minutes to hours, depending on the size of the job. Do not wait on our site for the job to be completed. Our system will send you an email when your extract is ready.
All data are produced in gzip compressed format. Most compression software has no difficulty with the files. The system produces only ASCII fixed column-format data, but with each extract it generates Stata, SAS, and SPSS command files to read the data into one of those statistical packages. The system also creates a codebook file that describes the content of your extract.
The definitions of every extract will remain on our server indefinitely, but the data files are subject to deletion after three days. However, the screen where you download extracts has a feature that lets you revise old extracts. When you click on "revise," all your selections for that extract will be loaded into the system, after which you can edit or regenerate it. Note, however, that each successive data release can create difficulties for recreating old extracts, because variable names or codes might change.