Diversity in Social Research Methods: Embracing More Representative Understandings of Equity and Social Justice

Our research work at the Futures Initiative in our inaugural and second years has explored questions of access and diversity that are central to our mission of promoting equity and innovation in higher education. Last year, with the CUNY Sociodemographics Map of NYC project, we explored the racial and ethnic characteristics of neighborhoods around CUNY’s 26 senior and community colleges as well as around CUNY’s professional and graduate schools throughout New York City. This year, in conjunction with one of the Future’s Initiatives’ Fall 2015 courses, Global Perspectives on Language and Education, we have been preparing a similar geospatial analysis of languages spoken in the home in communities located near each of CUNY’s 26 campuses to see if communities are being served with language courses at CUNY’s campuses that match the language needs of those neighboring communities. We are also working on a longitudinal analysis of achievement by CUNY students and faculty to see how trends in achievement correspond with major policy changes at CUNY and in higher education more broadly. All of these research endeavors, like much social science research, have relied on available data sources such as that collected and made publicly available from the U.S. Census Bureau, from the CUNY Office of Institutional Research and Assessment (CUNY OIRA), and from the Integrated Postsecondary Education Data System (IPEDS).

Each of these research endeavors have primarily been quantitative in approach, relying in the first two cases on geospatial analysis of proportions—proportions of communities that are categorized by the U.S. Census as a particular race or ethnicity or that are categorized into 39 language groupings that the American Community Survey makes publicly available from the data they collect. Much of our research has also incorporated various data sets from CUNY OIRA such as the race and ethnicity proportions of the student body throughout the CUNY system. Using these types of data sets that rely on categories such as race and ethnicity or languages grouped together by the U.S. Census means that we are inevitably dealing with simplifications and generalizations.

Much social science research is heavily reliant on these types of simplified and generalizable data to try to answer big questions about public policy, social justice, and equity, as just a few examples. However, it is a caveat of all quantitative social science research that relies on such data that there are inherent limitations in the data. Too often, I think, the problematic nature of these simplifications and generalizations are glossed over in the presentation of research products because the behavioral revolution that dominated throughout the social sciences in the mid-twentieth century brought with it supremacy in quantitative methods of research—research was not considered “scientific” in the social sciences unless it could be quantified (Bartels, 2004; McKeown, 2004).

In the research projects I’ve assisted with at the Futures Initiative over the last two years, the problematic nature of using data lumped into categories has over and again appeared in the research results and the discussions the research provokes both internal to our organization as well as externally. For example, the CUNY OIRA data that we have been using to study achievement longitudinally throughout the CUNY system has demonstrated just how problematic race and ethnicity categories can be in data collection and analysis. The five major categories used to define the race and ethnicity of the student body at CUNY from 1990-2014 have been rather narrow in scope—classified as either “American Indian/Alaska Native,” “Asian/Pacific Islander,” “Black,” “Hispanic,” or “White” (CUNY OIRA 2014b). Last year, when we were looking at the racial and ethnic makeup of students in the CUNY system and comparing with the student populations of other urban university systems, we found it often difficult to compare data because other institutions may use different categories for race and ethnicity. Even CUNY has over time changed several times how race and ethnicity are indicated in the student data books. For example, prior to 1972, the categories CUNY used for race and ethnicity were “Puerto Rican,” “Black,” “White,” and “Other” (CUNY OIRA 2014a). From 1972 to 1975, “Other Spanish” and “American Indian” were added along with “Oriental,” which was not changed to “Asian” until 1984 (CUNY OIRA 2014a). In 1976, CUNY also stopped counting “Puerto Rican” as a separate category and lumped “Puerto Rican” and “Other Spanish” into the broader category of “Hispanic,” which continues to be used today (CUNY OIRA 2014a). CUNY is not the only institution guilty of this changing definition for measuring race and ethnicity over time. Other higher education institutions have similarly amorphous categories, and the U.S. Census has changed its definition of race and ethnicity several times as well (more on that below).

The presence of a category of “other” already demonstrates that there is more to any given story than quantitative data analysis alone can tell. It serves to illustrate how these categories are themselves socially constructed—that is they are not essential in nature but rather created by social researchers, governments, institutions, and others to attempt to simplify and understand the world. Recently, at the Futures Initiative, the socially constructed nature of categories was illustrated once again in our research work looking at languages spoken in neighborhoods around CUNY campuses.  The U.S. Census collects very detailed information on languages spoken in U.S. homes—in 2015, they reported that there are at least 350 different languages spoken in U.S. homes (U.S. Census Bureau 2015). However, data from the U.S. Census’ American Community Survey on language spoken in the home has traditionally only been available in 39 language categories, which has resulted in the lumping together of many distinct sub-categories (U.S. Census Bureau 2015). Internal discussions at the Futures Initiative on this data have noted that these categories are not comprehensive—“Chinese,” for example, is used as a broad category rather than differentiating between Mandarin (the largest proportion of Chinese speakers) and the smaller classifications of Chinese varieties such as Jin, Wu, Huizhou, Gan, Xiang, Min, Hakka, Yue, and Pinghua (Lewis, et al. 2015).

That categories are first created and then often lumped together for the purposes of quantitative data analysis is problematic for social research given that often times the results are used to inform public policy. Simplification and generalization are not innately bad; often times, we wouldn’t be able to understand many things about the social world without them. The problem, of course, is when the results of such analysis are used to make decisions without any consideration of the limitations of the data. Sometimes quantitative data analysis is given a kind of authority and supremacy in decision-making that assumes results are a mirror of reality. However, as the categories are socially constructed, quantitative data analyses are only one perspective or representation of reality. That means that other perspectives on reality are also available and should be consulted to make better and fuller understandings of the world.

In the remainder of this blog post, I look first at some of the ways race and ethnicity as categories have been problematic for social research more generally, and not just in the research we’ve been doing at the Futures Initiative. Then I briefly discuss impacts on public policy and opportunities to build more well-rounded representations of reality by using mixed-methods approaches in social research that incorporate both quantitative data analysis, which is often quite broad in its implications, along with much deeper and more detail-rich ethnographic research.

Race and Ethnicity as Problematic Categories for Social Research

One of the most basic problems and critiques of using the concept of race as a category in social research is that it is incredibly difficult to define and measure “race” as a system of classification (Hirschman, 2004). We’ve seen this first hand in Futures Initiative research, where we have observed how the definition of race and ethnicity has been variable at CUNY over time. As Hirschman (2004) asks, “who decides one’s racial origins—are they based on subjective identity or are there objective criteria that observers can use?” Most readers will probably agree in many cases that categories such as race and ethnicity are incredibly subjective and anamorphous in nature—they change over time and are in a continuous cycle of contestation and redefinition. Take for example the U.S. Census and its ever-changing classification and definition of race and ethnicity (see Table 1 and Figure 1).

This table illustrates how race and ethnicity classifications have changed in the U.S. Census from 1890 to 2010

Text box which provides the 2010 U.S. Census definition of race categories used.
Figure 1 (from U.S. Census, 2011)

Given that the classification system that the U.S. Census has used for race and ethnicity has been historically variable, the Census Bureau cautions against using these categories in longitudinal analyses of the racial and ethnic composition of the U.S. population over time:

“The question on race for Census 2000 was different from the one for the 1990 census in several ways. Most significantly, respondents were given the option of selecting one or more race categories to indicate their racial identities.

Because of these changes, the Census 2000 data on race are not directly comparable with data from the 1990 census or earlier censuses. Caution must be used when interpreting changes in the racial composition of the U.S. population over time” (U.S. Census, 2001).

In addition to being variable through time in well-known data collection instruments such as the U.S. Census, attempting to measure race for social research in self-reported surveys and in studies that rely on observation, interviewing, and researcher reporting, can also yield very different results in reporting race and ethnicity. Saperstein (2006), for example, observed that in data from the General Social Survey, a well-known data collection tool in the social sciences, resulted in different patterns between income inequality and race in the United States depending on whether race was a self-reported measure or an observed measure. Observations and self-reports were both found to vary based on other factors of the reporters such as the age of the reporter, whether they were immigrants or native-born, what level of educational attainment they had, and what perceived level of social status the reporter has of themself (Saperstein, 2006). These results further support the socially constructed nature of categories such as race and ethnicity. And as, Saperstein says in conclusion of their reported results “few social scientists practice in their research what many have been preaching in their theories and classrooms for the past 20 years: that race is not a static, biological characteristic, but a socially constructed one, which can vary over time, place and social context (Saperstein, 2006).

Additional evidence for how these categories are socially constructed and malleable across time, place, and context comes from the reported results themselves. The U.S. Census data we have used to inform Futures Initiative studies of race and ethnicity at CUNY as well as the language data we are using are both peppered with a number of “other” categories for cases that don’t fit neatly into the categories and classifications that have been used to bucket people in society into. These categories of “other” often time end up hiding many unique and variable cases into a residual bucket that yields very little in terms of data analysis results but is a meaningful category for many people who end up being classified as “other” (Brown et al, 2006). Another problem that Brown highlights that is also a critical limitation of social data that groups people along lines of race and ethnicity is that ethnicity and race are not concepts that necessarily map onto each other very well, but they are often used in analysis and presented in research results together as though they are somehow comparable concepts (Brown et al, 2006). While both race and ethnicity can yield very important patterns of inequality in society, it is critical to remember that often times these are not comparable categories for comparison, and interactions between them can be unpredictable.

Classifications like different race and ethnicity groupings that are lumped onto society and defined by organizations of social power and dominance, such as the U.S. government or educational institutions, carry a certain sense of authority and perceived inevitability. For example, Hitlin and colleagues (2007) report how participants in social research or in the U.S. Census go through a psychological process of self-identification and self-categorization when determining how they will fill out the questions of race and ethnicity on surveys—if not given unlimited choices they are forced to select from those choices presented and have to try to fit themselves into a group that they may not otherwise select. By having individuals have to choose to identify with a particular race or ethnicity, researchers are in effect helping to create and reinforce the race and ethnicity as social categories. Even though we know that these types of categories are socially constructed, that they continue to be defined and institutionalized in practice means that they continue to exist and have real world implications.

Hirschman (2004) provides a compelling argument for how the concept of race is a social categorization system that developed alongside modernity—reaching a peak in the late nineteenth and the first half of the twentieth century. Hirschman further contends that race, as a concept, has no conceptual basis at all in society more generally and in social research more specifically, except for the continued presence of racism in society (2004).

Many in the social sciences have contended that race is a social construct, but at the same time, researchers have continued to use “race as if it were a nonproblematic set of categorical divisions” (Martin & Yeung, 2003 [emphasis in original]). While the American Anthropology Association released a statement acknowledging the anti-essentialist, socially-constructed nature of race as a concept in 1998 (AAA, 1998), meta-analysis of mainstream sociology research published in the American Sociological Review between 1937 and 1999 observed that there has been an ever increasing tendency in social research to analyze race (see Figure 2) (Martin & Yeung, 2003).

Proportion of articles using U.S. data samples published at different time periods between 1937 and 1999 in American Sociological Review that analyze race. Trend has been an increase over time in the percentage of articles that analyze race.
(Based on Martin & Yeung, 2003)

While race is well understood as a social construction and not a biological or genetic reality as is clear from the 1998 statement by the AAA, the increased social science focus on race as a concept through the twentieth century is indicative of the social reality and relevancy of the concept. In 2003, the American Sociological Association released a statement on the continued importance of collecting data and doing social scientific research on race. The ASA highlights sociological research that has illustrated continuing inequality between racial and ethnic groups of people in the job market, in neighborhood segregation, and in disparities in public health (ASA, 2003). They conclude that “as long as Americans routinely sort each other into racial categories and act on the basis of those attributions, research on the role of race and race relations in the United States falls squarely within this scientific agenda” (ASA, 2003).

Some social scientists have argued that by continuing to use race or ethnicity as a variable in causal social research models, it is possible that we are reinforcing the idea that race and ethnicity are a cause in and of itself in social inequities and in disparities in public health rather than reflecting how racial and ethnic patterns of inequity are actually the effects of other causal processes in society (Martin & Yeung, 2003). This is a very serious problem that must be avoided at all costs. It places significant responsibility on those who prepare and present research results to the public to be transparent and clear about what results are and what their implications may or may not be. The crucial point is recognizing and very explicitly stating the limits of the data being used and the limits of the methods of data analysis. It is important to help guide people away from making ecological fallacies—a research bias or mistake whereby researchers or consumers of research make conclusions about individuals based on aggregated group data (Trochim, 2006).

From Concept to Reality: The Impact of Categories on Public Policy and Education Policy

The problem of relying on simplified categories such as race, ethnicity, and even what language a particular family is said to speak in the home is that these categories often become the basis of public policy (Hirschman, 2004). In 1977, the Office of Management and Budget (OMB) set up cross-governmental standards aimed at providing consistent and comparable race and ethnicity data throughout the Federal governments statistical and administrative programs (Bennett, 1997). At the time, OMB recommended a policy called Statistical Policy Directive 15, which created four “racial” groups for classification including American Indian or Alaskan Native, Asian or Pacific Islander, black, and white; the directive also included one “ethnic” category of Hispanic origin (Bennett, 1997). These narrow buckets for self-identification and reporting were unsatisfactory in their narrowness and received significant criticism and debate through the 1990s. While OMB had said that they categories were not supposed to determine eligibility for Federal programs and funding and also had clarified from the beginning that these were not categories based on social science principles, public critique noted that Federal funding, Congressional districts, and enforcement of equal access provisions were all influenced by the OMB categorization system (Bennett, 1997). In the 1990s, OMB opened up public debate on the categorization practices in the Federal government with the aim of developing a data classification system for race and ethnicity that would better capture the diversity of race and ethnicity in the country while at the same time maintaining the ability to measure and track the social and economic characteristics of populations that have historically experienced racism and discrimination (Wallman et al, 2000).

While the OMB standards have since been broadened and reformed (for example, allowing individuals to choose more than one race and ethnicity from those available and allowing additional write-in options), the fact that race and ethnicity as categories continue to both indirectly and directly influence government programs is indisputable and provides a strong rationale for why we as researchers studying equity and diversity must continue to include these categories in our research. Furthermore, as the ASA makes clear in their 2003 statement, continuing to study race is crucial for public policy because while structural and institutionalized racism continues to exist in society, it is critical to observe how patterns of inequality exist.

In education policy, race and ethnicity continue to be central to understanding and evaluating access, funding and resources, attainment, achievement, and education policy and practice. As with other institutions of authority like the U.S. Census, the U.S. Department of Education has struggled with race and ethnic classifications and data collection and analysis—such that in 2008 the Department had to release a report explaining their process for classification in a report spanning 90 pages titled “Managing and Identity Crisis” (Hochshild & Shen, 2014). The former federal legislation known as No Child Left Behind mandated additional race and ethnicity classification and reporting, which has proven to be difficult in practice and fraught with the same kinds of problems that exist in any of these categories given they are social constructions. A greater focus across the board is focusing on how to collect more rich data by allowing for additional categories and classifications in race and ethnicity data collection tools—including mechanisms to allow respondents to “write in” how they identify.

In the research that we have done at the Futures Initiative, it would be so much richer to be able to compare race and ethnicity across a broader spectrum than was currently available in the data CUNY OIRA makes public. Education institutions like the City University of New York should do a better job at collecting and making publicly available data across more than a handful of race and ethnicity categories. This is a critical step for more fine-tuned quantitative data analysis of equity and diversity in education. As other institutions ranging from the U.S. Census to the Department of Education expand their range of possible categories for race and ethnicity, so too must individual institutions. In a public, urban system it is especially essential given the diverse population of students served by these institutions.

Alternative Approaches: Humanistic, Qualitative, and Mixed Research Methods

Categories and classification systems like race and ethnicity are going to continue to function as important concepts in social research methods so long as structural inequality and racism continues to exist in society and so long as institutions of power in society continue to define and reinforce the existence of these categories. Quantitative analysis and its focus on statistics and aggregation is partly to blame for this phenomenon because, by nature, quantification relies on simplifying the world. These simplified models tell us important things about patterns of social inequality, but as has been illustrated throughout this blog post, these can only ever give us one distinct and limited perspective on reality.

The way to counteract prevailing quantitative narratives and to strengthen and supplement the research we do on social inequality and diversity is to embrace diversity in the way we approach social research. We must engage in more humanistic, qualitative, and ethnographic methods of social inquiry and embrace the better and more comprehensive picture that mixed-methods research designs allow by combining aggregate data analysis of broad trends in society with more deep and focused qualitative research with individuals. In race and ethnicity research that means not focusing just on statistical patterns but on the “lived” experience of race and ethnicity for individuals (Brown, 2007). Tarrow refers to how we can use the mixed-method approach to put “qualitative flesh on quantitative bones” (2004). While quantitative research results often illustrate big picture patterns in society, qualitative research focused on interviews, focus groups, and ethnography can put these quantitative results into their real-life context and make them less of an abstraction.

In quantitative research I already mentioned the problem of the ecological fallacy. In qualitative research, there is a similar kind of inference problem called the exception fallacy that researchers and readers of research can make by making a conclusion about a group of people based on one exceptional case (Trochim, 2006). This is just as harmful in terms of contributing to stereotypes and unclear assessments of reality as the ecological fallacy. Both of these problems can be kept in check using mixed-methods–combining strengths of quantitative research with the more focused attention on individuals and illustrative case studies that qualitative methods can allow. Our methodology at the Futures Initiative has been to combine variable-oriented quantitative data analysis in many of our research projects with on-the-ground interaction, dialogue, and reporting of the lived experience of teachers, graduate students, and undergraduate students throughout the City University of New York system. We are building a mixed-methods approach at analyzing diversity and equity in higher education in this country. In the years to come, we will continue to expand on these projects with an enduring focus on diversity in research methods to match and achieve our mission.


American Anthropological Association. 1998. AAA Statement on Race. American Anthropologist. 100(3): 712-713. Available at

American Sociological Association. 2003. Statement of the American Sociological Association on The Importance of Collecting Data and Doing Social Scientific Research on Race. Available at:

Bartels L. 2004. Some Unfulfilled Promises of Quantitative Imperialism. In: Rethinking Social Inquiry: Diverse Tools, Shared Standards. Brady H, Collier D (eds). Oxford, UK: Rowman & Littlefield Publishers, Inc.

Bennett T. 1997. “Racial” and Ethnic Classification: Two Steps Forward and One Step Back? Public Health Reports. Available at:

Brown J, Hitlin S, Elder G. 2007. The Importance of being “other”: A natural experiment about lived race over time. Social Science Research. Available at:

Cohn D. 2010. Census History: Counting Hispanics. Pew Research Center Social & Demographic Trends. Available at:

CUNY Office of Institutional Research and Assessment. 2014a. Student Data Book (Current & Historical). Available at:

CUNY Office of Institutional Research and Assessment. 2014b. Total Enrollment by Race/Ethnicity and College: Percentages. Available at:

Hirschman C. 2004. The Origins and Demise of the Concept of Race. Population and Development Review. Available at:

Hitlin S, Brown JS, Elder GH. 2007. Measuring Latinos: Racial vs. Ethnic Classification and Self-Understandings. Social Forces. Available at:

Hochschild JL, Shen FX. 2014. Race, Ethnicity, and Education Policy. In: Oxford Handbook of Racial and Ethnic Politics in America. New York: Oxford University Press.

Lee S. 1993. Racial classifications in the US census: 1890-1990. Ethnic & Racial Studies. Available at:

Lewis MP, Simons GF, Fennig CD (eds.). 2015. Ethnologue: Languages of the World, Eighteenth edition. Dallas, TX: SIL International. Available at:

Martin J, Yeung K-T. 2003. The Use of the Conceptual Category of Race in American Sociology, 1937-99. Sociological Forum. Available at:

McKeown T. 2004. Case Studies and the Limits of the Quantitative Worldview. In: Rethinking Social Inquiry: Diverse Tools, Shared Standards. Brady H, Collier D (eds). Oxford, UK: Rowman & Littlefield Publishers, Inc.

Saperstein A. 2006. Double-Checking the Race Box: Examining Inconsistency between Survey Measures of Observed and Self-Reported Race. Social Forces. Available at:

Snipp C. 2003. Racial Measurement in the American Census: Past Practices and Implications for the Future. Annual Review of Sociology. Available at:

Tarrow S. 2004. Bridging the Quantitative-Qualitative Divide. In: Rethinking Social Inquiry: Diverse Tools, Shared Standards. Brady H, Collier D (eds). Oxford, UK: Rowman & Littlefield Publishers, Inc.

Trochim WMK. 2006. Two Research Fallaces. Research Methods Knowledge Base Web site. Available at:

U.S. Census Bureau. 2001. Overview of Race and Hispanic Origin: 2000. Census 2000 Brief. Available at:

U.S. Census Bureau. 2011. Overview of Race and Hispanic Origin: 2010. 2010 Census Briefs. Available at:

U.S. Census Bureau. 2015. Census Bureau Reports at Least 350 Languages Spoken in U.S. Homes. U.S. Census Bureau Web site. Available at:

Wallman K, Evinger S, Schecter S. 2000. Measuring Our Nation’s Diversity: Developing a Common Language for Data on Race/Ethnicity. American Journal of Public Health. Available at:


The Futures Initiative
The Graduate Center, CUNY
365 Fifth Avenue
New York, NY 10016-4309