Score contribution per author:
α: calibrated so average coauthorship-adjusted count equals average raw count
We propose an approach to construct text-based time-series indices in an optimal way—typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. Our methodology relies on binary selection matrices that, applied to the vocabulary of tokens, select the relevant texts in the corpus. Various widely known text-based indices, such as the Economic Policy Uncertainty (EPU) index, can be formulated in terms of selection matrices. We design a genetic algorithm with domain-specific knowledge featuring tailor-made crossover and mutation operations to perform the complex optimization. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices that forecast inflation at various horizons.