Code for MCVL Spanish Social Security Data

Replication files for 'Learning by working in big cities'

The site distributes and documents computer programs to replicate the results obtained by Jorge De la Roca and Diego Puga in their article 'Learning by working in big cities', published in The Review of Economic Studies.

This research uses anonymized administrative data from the Muestra Continua de Vidas Laborales con Datos Fiscales (MCVL) with the permission of Spain's Dirección General de Ordenación de la Seguridad Social. We are NOT allowed to make the MCVL data available. Thus, in addition to the replication files available here, interested researchers will need to request access to the MCVL data from Spain's Dirección General de Ordenación de la Seguridad Social, by following the application process described in the site.

The MCVL data are extremely rich, containing matched anonymized social security, income tax and census records for a 4% random sample of Spanish workers, pensioners and unemployment benefit recipients. The application process required to obtain the MCVL data is simple, and approved users are allowed to work with the data in their own computers. By providing this replication code, in addition to enabling easy replication of our results, we hope to substantially reduce entry costs for users of the MCVL data. Users of these computer programs are kindly asked to cite The Review of Economic Studies article as the source.

Instruments for Latino-white and black-white residential segregation

Replication data set (data_seglatinos.dta) for 'Does segregation matter for Latinos?'

This metropolitan area (CBSA) data set provides the instrumental variables to replicate the results obtained by Jorge De la Roca, Ingrid Gould Ellen and Justin P. Steil in their article 'Does segregation matter for Latinos?', published in Journal of Housing Economics.

The main instrument for the Latino-white dissimilarity index is the dissimilarity index between single-family detached housing and other housing types in 1970 (e.g. multi-family or single-family attached housing). We use data from the 1970 Neighborhood Change Database. The source units are census tracts as defined in 1970 for Standard Metropolitan Statistical Areas (SMSA). We crosswalk 1970 SMSAs to 2008 CBSAs.

To instrument for the black-white dissimilarity index, we use the measures proposed by David M. Cutler and Edward L. Glaeser in 'Are ghettos goor or bad' (Quarterly Journal of Economics, 1997). These measures are the number of local governments and the share of local revenue from federal or state transfers. We use data from the 1962 Census of Governments Survey, which are made available by the Inter-university Consortium for Political and Social Research at the University of Michigan.

The data set also includes metropolitan area characteristics in 1970, such as the Latino-white and black-white dissimilarity indices, and the shares of population that are Latino, black, unemployed, working in manufacturing, in poverty status and with a bachelor's degree. We also provide our crosswalk between Consistent Public Use Microdata Areas (PUMAs) and 2008 CBSAs (data_xwalk_conspuma_cbsa.dta). See the article for specific details. Users of these data sets are kindly asked to cite the Journal of Housing Economics article as the source.

Neighborhood exposure measures by race and metropolitan area 1980—2010

Replication data set (data_race21st.dta) for 'Race and neighborhoods in the 21st century'

This metropolitan area (CBSA) data set provides neighborhood exposure measures by race/ethnicity in 1980, 1990, 2000 and 2010 to replicate the results obtained by Jorge De la Roca, Ingrid Gould Ellen and Katherine M. O'Regan in their article 'Race and neighborhoods in the 21st century', published in Regional Science and Urban Economics.

Measures of exposure to poverty, to neighbors with a bachelor's degree, and to employed neighbors are obtained using census data for 1980, 1990 and 2000, and ACS data for 2006—2010. Measures of exposure to proficiency in standardized test scores and free/reduced-price lunch eligibility are obtained from the Department of Education for 2008—2009 and expressed in percentiles to control for differences in standardized tests and poverty rates across metropolitan areas. Measures of exposure to violent, property and total crime are only available for 60 CBSAs in 2000 and are obtained from the National Neighborhood Crime Study (Peterson and Krivo, 2000). These measures are also expressed in percentiles.

The data set also provides CBSA-level measures of residential segregation (dissimilarity and isolation indices) obtained from US2010, a joint project between the Russell Sage Foundation and Brown University, together with CBSA characteristics such as total population, shares of population that are black, Latino, Asian, over 65 years, under 15 years, foreign-born, unemployed, working in manufacturing, working in professional occupations and in poverty status. Users of this data set are kindly asked to cite the Regional Science and Urban Economics article as the source.