Score contribution per author:
α: calibrated so average coauthorship-adjusted count equals average raw count
This paper discusses the challenges researchers face when pre-registering experimental studies that incorporate machine learning methods for data analysis, in particular text mining. Compared to standard behavioral data, text data (e.g., free-form chat content) is less predictable in form and meaning, and it is often unclear which representation techniques will yield the most meaningful results. Drawing on experience from multiple experimental studies, we propose best practices and offer guidelines to assist researchers working in this growing area.