hret disparities toolkit
a toolkit for collecting race, ethnicity, and primary language information from patients
(This page was designed to be printer-friendly)
< Return to last page   |  Toolkit Home
spacer spacer spacer

Please note: Not all links are displayed in this printer-friendly layout. To view and print the links, you will need to click on them below or on the previous page.

Collecting the Data - The Nuts and Bolts

The National Research Council of the National Academies report Eliminating Health Disparities: Measurement and Data Needs (2004) recommends that hospitals, other health care providers, and health insurers collect standardized data on race and ethnicity using the Office of Management and Budget (OMB) standards as a base minimum. However, experts recognize that greater detail or granularity beyond the OMB categories may be more useful for hospitals and health care organizations in target improvements for diverse populations. We recognize that collecting granular level data at the organizational level may create challenges for reporting or for research. The Institute of Medicine's (IOM) recent report Race, Ethnicity and Language Data: Standardization for Health Care Quality Improvement (2009) provides new recommendations to help facilitate and further standardize the collection of race, ethnicity and primary language data.  We recommend that health care providers collect race, Hispanic ethnicity and granular ethnicity data separately and "roll up" or aggregate the granular ethnicities to the OMB race and Hispanic ethnicity categories as needed.

IOM Report: Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement

Look to the blue Links column to the right for helpful resources.


We recommend collecting race and ethnicity information directly from patients or their caregivers. Race and ethnicity information should be collected only once and periodically validated. Repeated collection should be avoided to reduce the burden both for patients and for staff responsible for collecting the information. Once this information is collected, it should be stored in an electronic format when possible.

In addition, if a patient refuses to answer questions about their racial or ethnic background, the registration staff should move on with the registration process and record "declined" in the field indicating that the patient did not want to answer this question. Providing information about race and ethnicity is completely voluntary, and staff should recognize when people feel uncomfortable or explicitly state that they do not want to respond to these questions.

We have designed this Toolkit to serve as a resource for hospitals and health care organizations. The primary components of race and ethnicity data collection that should be considered standard practice include the following:

  1. Collect data directly from the patient or from a designated representative.
  2. Provide a rationale or reason for why this information is being collected.
  3. Depending on the capacity of your organization, decide whether you will be providing broad or granular categories. If using predefined categories, decide whether you will be using the bare minimum, such as OMB, or whether you will be providing more granular categories. (Information about both broad categories and granular categories is listed in the section "Which Categories to Use.")

Look to the blue Links column to the right for helpful resources.


Hospitals, Clinics, Group Practices

We recommend that this information be collected at the time of patient registration for hospitals, clinics, and medical group practices. This information can be collected face-to-face or over the telephone.

Health Plans

For health plans and insurers, we recommend that this information be collected at the time of enrollment, if possible. We realize that this may pose a challenge as some employers prohibit asking this information of their employees. America's Health Insurance Plans (AHIP) has developed a toolkit, "Tools to Address Disparities in Health: Data as Building Blocks for Change—A Data Collection Toolkit for Health Insurance Plans/Health Care Organizations (PDF)."

Look to the blue Links column to the right for helpful resources.


Always provide a rationale for why you are asking patients/enrollees to provide information about their race/ethnicity. Research shows that patients are most comfortable providing this information when told why it is being collected and how it will be used. We recommend that health care organizations and health plans collect this information for quality monitoring purposes. Below is a sample rationale, which is easy to communicate and focuses on data collection for quality monitoring.


"We want to make sure that all our patients get the best care possible. We would like you to tell us your racial/ethnic background so that we can review the treatment that all patients receive and make sure that everyone gets the highest quality of care."

In addition, it is important to state that the information is confidential:

"The only people who see this information are registration staff, administrators for the hospital, and the people involved in quality improvement and oversight, and the confidentiality of what you say is protected by law."

Look to the blue Links column to the right for helpful resources.

Which Categories to Use

Provided below are the OMB (broad categories) and CDC Race and Ethnicity Code Sets (granular categories that can be rolled up into the OMB categories for reporting or research purposes). As indicated, hospitals can choose to present patients/enrollees with a list of either broad or granular categories allowing patients/enrollees to self-identify their racial/ethnic background.

Look to the blue Links column to the right for helpful resources.

Broad Categories (OBM)

OMB Revised Standards (1997)

In 1997, the Office of Management and Budget (OMB) published revisions to the Standards for Classification of Federal Data on Race and Ethnicity. For detailed information about the OMB standards, go here.

The OMB revised standards includes separate race and ethnicity questions. See below for specific OMB recommendations.

First ask questions about ethnicity.

OMB Ethnicity

  • Hispanic or Latino: A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race. The term "Spanish origin" can be used in addition to "Hispanic or Latino."
  • Not Hispanic or Latino.

OMB Race

  • American Indian/Alaska Native: A person having origins in any of the original peoples of North and South America (including Central America), and who maintains tribal affiliation or community attachment.
  • Asian: A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.
  • Black/African American: A person having origins in any of the black racial groups of Africa. Terms such as "Haitian," "Dominican," or "Somali" can be used in addition to "Black or African American."
  • Native Hawaiian/Other Pacific Islander: A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.
  • White: A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.

Our recommended modifications to OMB include adding the following categories:

  • Some Other Race: (This category replaces the "Multiracial" category in the previous version of the toolkit. It provides a response option for those Hispanics and others who do not relate to the current OMB race categories.)
  • Declined: (This category is an indication that the person did NOT want to respond to the question and should not be asked again during the same visit or during a subsequent visit.)
  • Unavailable: (This category is an indication that the person could not respond to the question and can be asked again during the same visit or during a subsequent visit.)

Collapsing Race and Ethnicity

Field research by HRET has shown that some health care organizations have only one field (for race) and do not have a separate field for ethnicity. Under these circumstances, we have collapsed race/ethnicity to facilitate recording both in one field. The recommended categories are:

  • African American/ Black
  • Asian
  • Caucasian/White
  • Hispanic/Latino/White
  • Hispanic/Latino/Black
  • Hispanic/Latino/Declined
  • Native American
  • Native Hawaiian/Pacific Islander
  • Some Other Race 
  • Declined
  • Unavailable/Unknown

Granular Categories

In addition to collecting data in the OMB race and ethnicity categories, organizations should also collect granular ethnicity data using categories that are representative of the population served.  The IOM Subcommittee on Standardized Collection of Race/Ethnicity Data for Healthcare Quality Improvement recommends that granular ethnicity categories should be selected from a national standard set based on ancestry (e.g., Centers for Disease Control and Prevention [CDC]/Health Level 7 [HL7] Race and Ethnicity Code Set 1.0). 

Not all organizations collecting granular ethnicity data will need to include the entire national standard set of categories in their databases or on their data collection instruments. Rather, organizations should select categories from the set that are applicable to their service population. Whenever a limited list of categories is offered to respondents, the list should include an open-ended response option of "Other, please specify:__" so that each individual who desires to do so can self-identify.

When respondents do not self-identify as one of the OMB race or Hispanic ethnicity categories and provide only a granular ethnicity response, a process for rolling the granular ethnicity categories up to the OMB categories should be used.  Ethnicities that do not correspond to a single OMB race category should be categorized as "no determinate OMB classification".

Centers for Disease Control Race and Ethnicity

Code Set

The U.S. Centers for Disease Control and Prevention (CDC) have prepared a code set for use in coding race and ethnicity data. This code set is based on current federal standards for classifying data on race and ethnicity, specifically the minimum race and ethnicity categories defined by the OMB described above and a more detailed set of race and ethnicity categories maintained by the U.S. Bureau of the Census. The code set can be applied in both electronic and paper-based record systems.

Within the table, each race and ethnicity concept is assigned a unique identifier, which can be used in electronic interchange of race and ethnicity data. The hierarchical code is an alphanumeric code that places each discrete concept in a hierarchical position with reference to other related concepts. For example, Costa Rican, Guatemalan, and Honduran are all ethnicity concepts whose hierarchical codes place them at the same level relative to the concept Central American, which is the same hierarchical level as Spaniard within the broader concept Hispanic or Latino.

In contrast to the unique identifier, the hierarchical code can change over time to accommodate the insertion of new concepts.  For more information, see the two links below.

Granular Code Set I (PDF)
Granular Code Set II (PPT)

IOM Subcommittee Proposed Template of Granular Ethnicity Categories

The IOM subcommittee has also created a template listing granular ethnicity categories from multiple sources including the CDC/HL7 list. Some of the granular ethnicities included in the template have already been assigned permanent five-digit unique numerical codes by CDC/HL7. Others still require permanent five-digit unique numerical codes.

IOM Subcommittee Template of Granular Ethnicity Categories (Table E-1 in Appendix E of IOM Report)

Language Categories

To simplify the collection of language data, most organizations should develop a list of common languages used by their service population, accompanied by an open-ended response option for those whose language does not appear on the list. 

Locally relevant language categories should be selected from a national standard set such as that available from the Census list or IOM report. A sample list is as follows:

• African languages
• American Sign Language
• Arabic
• Armenian
• Chinese
• French
• French Creole
• German
• Greek
• Gujarathi
• Hebrew
• Hindi
• Hungarian
• Italian
• Japanese
• Korean
• Laotian
• Miao Hmong
• Mon-Khmer Cambodian
• Other native North American languages
• Persian
• Polish
• Portuguese
• Portuguese Creole
• Russian
• Scandinavian languages
• Serbo-Croatian
• Spanish
• Tagalog
• Thai
• Urdu
• Vietnamese
• Yiddish
• Availability of Sign Language or other auxiliary aids or services
• Other, please specify:___
• Do not know
• Unavailable/Unknown
• Declined

IOM Subcommittee Template of Spoken Language Categories and Coding (Table I-1 in Appendix I of IOM Report)




Keep Posted!
Sign up if you would like us to keep you informed regarding updates to the Disparities Toolkit and this web site. We will not share your information with anyone.

spacer spacer spacer
corner hret-Health Research & Educational Trust (in partnership with AHA)
< Return to last page  |  Toolkit Home
© Copyright HRET - All rights reserved.
This Web site contains links to sites that are not owned or maintained by the Health Research and Educational Trust (HRET) or the American
Hospital Association (AHA). HRET and AHA are not responsible for the content of linked sites and the views expressed on non-HRET/AHA
linked sites do not necessarily reflect the views of the Health Research and Educational Trust or the American Hospital Association.