Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases. © 2014 American Chemical Society.