Introduction and Methodology


With this project, we have set out to map the languages spoken across New York City (NYC) using U.S. Census data from the 2013 American Community Survey. A proximity buffer analysis was done using a Geographic Information System (GIS) to aggregate the percentage of the population living in census tracts within a 1-mile buffer from each CUNY campus that speak different languages (as grouped by the U.S. Census Bureau). The analysis gives us a snapshot of the languages spoken in the communities closest to each CUNY campus. In addition, data on languages spoken by first-time freshmen students at CUNY campuses and on language courses taught at different CUNY colleges provide points of comparison between languages spoken in communities around each CUNY campus and the language education opportunities offered at each CUNY campus

Data and Methodology

Data sources:
  • Languages spoken data come from the U.S. Census Bureau’s American Community Survey 2013, 5 –Year Estimates. Table B16001 “Language Spoken at Home by Ability to Speak English for the Population 5 Years and Over”
  • Shapefiles for usage in the analysis and cartography come from the U.S. Census Bureau’s New York State Census Tract 2013 TIGER/Line® Shapefiles, U.S. Census Bureau, Geography Division.
  • Data appearing on the online, interactive version of the map include languages spoken by first time freshmen across CUNY’s community and senior colleges from the Office of Institutional Research Assessment.
  • In addition, data on languages taught at CUNY come from Alberta Gatti and Alexander Funk and the Institute for Language Education in Transcultural Context.
Overview of Methodology:
  • Data table on languages spoken from the U.S. Census Bureau’s B16001 table were processed to calculate percentages of the population in each census tract in New York City speaking each of the 39 language categories the U.S. Census classifies.
  • The B16001 table was joined with the U.S. Census Bureau’s 2013 TIGER/Line® shapefiles using QGIS.
  • Each of CUNY’s campuses were geocoded to find the spatial coordinates using the official addresses for each campus. These coordinates were then mapped in QGIS.
  • A one mile buffer was generated around each of the CUNY campuses, the buffers were used to select all of the census tracts around each campus that were within one mile of the campus—the percentage of the population for all selected tracts for each campus were summed to get the total percentages of the population within one mile of each campus that speak each of the U.S. Census language categories.


A Note on Limitations

  • The largest limitation of this analysis is the limited language data that was available from the U.S. Census. Our understanding of language is often at odds with the way the U.S. Census classifies and groups languages. For example, categories like “African Languages”, “Chinese”, and several categories of “Other” obscure the reality that these categories are made up of several different and unique sub-language categories.
    • NYC is said to be home to hundreds of languages, and the U.S. Census data used in this analysis with less than 40 categories does not reflect that diversity
  • The buffer analysis also assumes a one-mile distance from campus based on walk-ability. However, many students at CUNY use NYC’s vast and well-connected public transit system of buses and trains to commute longer distances to attend other campuses in the CUNY system. This analysis has no way of capturing that kind of movement.
    • That being said, there may still be a very good rationale for using the walk-ability distance to infer community characteristics because as public resources the community colleges and senior colleges of CUNY have a mandate to serve their communities. So, the language education offerings should reflect the diverse community characteristics near CUNY’s different campuses.