Code for MCVL Spanish social security data

Replication files for 'Learning by working in big cities'

The site distributes and documents computer programs to replicate the results obtained by Jorge De la Roca and Diego Puga in their article 'Learning by working in big cities', published in The Review of Economic Studies.

This research uses anonymized administrative data from the Muestra Continua de Vidas Laborales con Datos Fiscales (MCVL) with the permission of Spain's Dirección General de Ordenación de la Seguridad Social. We are NOT allowed to make the MCVL data available. Thus, in addition to the replication files available here, interested researchers will need to request access to the MCVL data from Spain's Dirección General de Ordenación de la Seguridad Social, by following the application process described in the site.

The MCVL data are extremely rich, containing matched anonymized social security, income tax and census records for a 4% random sample of Spanish workers, pensioners and unemployment benefit recipients. The application process required to obtain the MCVL data is simple, and approved users are allowed to work with the data in their own computers. By providing this replication code, in addition to enabling easy replication of our results, we hope to substantially reduce entry costs for users of the MCVL data. Users of these computer programs are kindly asked to cite The Review of Economic Studies article as the source.

Simulating censored earnings in MCVL Spanish social security data

Files used to simulate earnings for MCVL censored observations (code_selectionmig.zip) in 'Selection in initial and return migration: Evidence from moves across Spanish cities', published in Journal of Urban Economics

The folder contains three files. The Stata do file censoring_simulation.do simulates earnings for censored observations in MCVL as described in the article. The user should note that the setup of folders and the structure of the code follow the MCVL code provided above for the article 'Learning by working in big cities.' For convenience, the user should save this do file in a new directory code/censoring/.

The Stata data file ss_bounds.dta contains upper and lower earnings censoring bounds collected from Spain's Boletín Oficial del Estado (BOE) for the period 1980-2013. Some bounds are also identified from plotted monthly earnings densities and added in separate columns. Bounds vary by type of occupation in the social security on an annual basis. Figures are expressed in nominal cents of euro.

The Stata data file cpi_monthly.dta provides monthly consumer price index data for the period 1980-2014, obtained from Spain's Instituto Nacional de Estadística. For convenience, the user should save both data files in the directory otherdata/. Users of these files are kindly asked to cite the Journal of Urban Economics article as the source.

Instruments for Latino-white and Black-white residential segregation

Replication data set (data_seglatinos.dta) for 'Does segregation matter for Latinos?'

This metropolitan area (CBSA) data set provides the instrumental variables to replicate the results obtained by Jorge De la Roca, Ingrid Gould Ellen and Justin P. Steil in their article 'Does segregation matter for Latinos?', published in Journal of Housing Economics.

The main instrument for the Latino-white dissimilarity index is the dissimilarity index between single-family detached housing and other housing types in 1970 (e.g. multi-family or single-family attached housing). We use data from the 1970 Neighborhood Change Database. The source units are census tracts as defined in 1970 for Standard Metropolitan Statistical Areas (SMSA). We crosswalk 1970 SMSAs to 2008 CBSAs.

To instrument for the black-white dissimilarity index, we use the measures proposed by David M. Cutler and Edward L. Glaeser in 'Are ghettos goor or bad' (Quarterly Journal of Economics, 1997). These measures are the number of local governments and the share of local revenue from federal or state transfers. We use data from the 1962 Census of Governments Survey, which are made available by the Inter-university Consortium for Political and Social Research at the University of Michigan.

The data set also includes metropolitan area characteristics in 1970, such as the Latino-white and black-white dissimilarity indices, and the shares of population that are Latino, black, unemployed, working in manufacturing, in poverty status and with a bachelor's degree. We also provide our crosswalk between Consistent Public Use Microdata Areas (PUMAs) and 2008 CBSAs (data_xwalk_conspuma_cbsa.dta). See the article for specific details. Users of these data sets are kindly asked to cite the Journal of Housing Economics article as the source.

Neighborhood exposure measures by race and metropolitan area 1980–2010

Replication data set (data_race21st.dta) for 'Race and neighborhoods in the 21st century'

This metropolitan area (CBSA) data set provides neighborhood exposure measures by race/ethnicity in 1980, 1990, 2000 and 2010 to replicate the results obtained by Jorge De la Roca, Ingrid Gould Ellen and Katherine M. O'Regan in their article 'Race and neighborhoods in the 21st century', published in Regional Science and Urban Economics.

Measures of exposure to poverty, to neighbors with a bachelor's degree, and to employed neighbors are obtained using census data for 1980, 1990 and 2000, and ACS data for 2006–2010. Measures of exposure to proficiency in standardized test scores and free/reduced-price lunch eligibility are obtained from the Department of Education for 2008–2009 and expressed in percentiles to control for differences in standardized tests and poverty rates across metropolitan areas. Measures of exposure to violent, property and total crime are only available for 60 CBSAs in 2000 and are obtained from the National Neighborhood Crime Study (Peterson and Krivo, 2000). These measures are also expressed in percentiles.

The data set also provides CBSA-level measures of residential segregation (dissimilarity and isolation indices) obtained from US2010, a joint project between the Russell Sage Foundation and Brown University, together with CBSA characteristics such as total population, shares of population that are black, Latino, Asian, over 65 years, under 15 years, foreign-born, unemployed, working in manufacturing, working in professional occupations and in poverty status. Users of this data set are kindly asked to cite the Regional Science and Urban Economics article as the source.