Score contribution per author:
α: calibrated so average coauthorship-adjusted count equals average raw count
Tax data are invaluable for research, but privacy concerns severely limit access. Although the US Internal Revenue Service produces a public-use file (PUF), improved technology and the proliferation of individual data have made it increasingly difficult to protect. Synthetic data are an alternative that reproduce the statistical properties of administrative data without revealing individual taxpayer information. This paper evaluates the quality and safety of the first fully synthetic PUF and demonstrates its performance in tax model microsimulations. The synthetic PUF could also be used to develop and debug statistical programs that could then be safely run on confidential data via a validation server.