We Need Better Data on Benefit Incomes

Bruce Meyer and Nikolas Mittig has a new paper showing that the receipt of SNAP, TANF, and Section 8 benefits is underreported in the Current Population Survey (CPS) and that, when you use administrative data to adjust for the underreporting, you find that these programs are far more effective at reducing poverty than the official statistics say (see Dylan Matthews' coverage at Vox). This is not a new observation, but it does give us an opportunity to underscore an important point: the federal government needs to get better at estimating household receipt of social benefits.

Those who work deep in the weeds of poverty and income data have known for a long time now that the conventional income surveys significantly underreport the receipt of certain social benefits. Because of this, a whole cottage industry of papers and methods have formed around trying to correct for this underreporting.

You have the approach favored by Meyer and Mittig, which involves using the Census Bureau's Personal Identification Validation System (PVS) to directly match (where possible) CPS survey participants to their Social Security Numbers in welfare administrative data. You have the approach favored by Dube and Jacobs, which uses the administrative data to calculate the degree of underreporting and then reweights the survey records so that the new weighted population has benefit receipt characteristics that match the administrative data. Then you have microsimulation models like TRIM, which apply benefit eligibility rules and administrative data regarding benefit participation levels to income survey records (see Arloc Sherman and Danilo Trisi of CBPP for the difference these models make).

All of these efforts have a commendable aim: to impute properly social benefits into income data so that we can have a more accurate estimation of their effects. But they are all also flawed in various ways. The PVS system is not perfect and for at least some percentage of survey records, it is impossible to match them to their administrative data records. The reweighting approach assumes that the households that report receipt of social benefits share the same characteristics as the households fail to report receipt, which is almost certainly not true. The microsimulation models are mainly useful for their specific purpose of estimating aggregate budgetary effects of new tax and benefit rules.

One could toy around with how best to do second-hand social benefit imputations, but really this is something the Census and other data agencies should be doing in the first instance. It is ridiculous that the federal government cannot produce accurate microdata on household income characteristics when it has administrative data on basically all major income streams (income tax records and benefit receipt records). It may cost more money and require a lot more work (and possibly even less internal anonymization) to put out correct information, but it's increasingly hard to see the social value of putting out information that we know is inaccurate.