Aims: We sought to compare the generalizability and prognostic implications of heart failure with preserved ejection fraction (HFpEF) scores (HFA-PEFF and H2FPEF score) in Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT) and Phosphodiesterase-5 Inhibition to Improve Clinical Status and Exercise Capacity in Heart Failure with Preserved Ejection Fraction (RELAX) trial participants and matched controls from the Atherosclerosis Risk in Community (ARIC) study. Methods and results: Based on the respective scores, the study participants from the TOPCAT (N = 356), RELAX (N = 216), and ARIC (N = 379) studies were categorized as having a low, intermediate, or high likelihood of HFpEF. Age, sex, and race matched controls free of cardiovascular disease who had unexplained dyspnoea were used to evaluate the diagnostic performance. The prognostic value of scores was assessed using multivariable-adjusted Cox regression analyses. The median HFA-PEFF scores in the TOPCAT, RELAX, and ARIC studies were 5.0 [interquartile range (IQR): 5.0–6.0], 4.0 (IQR: 2.0–4.0), and 3.0 (IQR: 2.0–4.0), respectively. The median H2FPEF scores in the three studies were 5.5 (IQR: 4.0–7.0), 6.0 (IQR: 4.0–7.0), and 3.0 (IQR: 2.0–5.0), respectively. A low HFA-PEFF and H2FPEF score can rule out HFpEF with high sensitivity (99.5% and 99.6%, respectively) and negative predictive value (95.7% and 98.3%, respectively). A high HFA-PEFF and H2FPEF score can rule-in HFpEF with good specificity (82.8% and 95.6%, respectively) and positive predictive value (79.9% and 90.4%, respectively). Among TOPCAT participants, the hazard for adverse cardiovascular events per point increase in HFA-PEFF and H2FPEF score was 1.26 (95% confidence interval: 0.98–1.63) and 1.01 (95% confidence interval: 0.88–1.15), respectively. A higher H2FPEF score was associated with lower peak oxygen intake in RELAX trial participants (adjusted P = 0.01). Conclusions: The HFA-PEFF and the H2FPEF scores are reliable diagnostic tools for HFpEF. The prognostic utility of HFpEF scores requires further validation in larger rigorously phenotyped populations.