Score contribution per author:
α: calibrated so average coauthorship-adjusted count equals average raw count
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. To create the Census Tree, we begin with a collection of high-quality links contributed by the users of a free online genealogy platform, many of which would be difficult or impossible to find using currently available linking technologies. We then use these links as training data for a machine learning algorithm to make new matches, and incorporate other recent efforts to link the historical U.S. censuses. Finally, we introduce a procedure for filtering the links and adjudicating disagreements. Our complete Census Tree achieves match rates across adjacent censuses that are between 69 and 86 % for men and between 58 and 79 % for women—a major breakthrough compared to previous linking efforts. The size of the Census Tree allows researchers in the social sciences and other disciplines to construct longitudinal datasets that are highly representative of the population. We validate the accuracy of these links and provide researchers with a simple tool for choosing their preferred tradeoff between sample size and accuracy. To demonstrate the advantages of the Census Tree, we extend the work of Abramitzky, Boustan, Jácome, and Pérez (2021) to include intergenerational mobility estimates for additional immigrant nationalities and for women.