Little has been done to examine the relative merit of measures used to assess the impact of diagnostic decision support systems (DDSS) on physician performance. In this study, 10 different single-measures of diagnostic performance were compared empirically. The measures were of three types: rank-order, all-or-none, and appropriateness. The responsiveness (RESP) of each measure was estimated under two repeated-measures experimental conditions. RESP is the degree to which a measure could detect differences between conditions of low and high performance. The diagnostic performance of 108 physicians was compared on medical cases of varying diagnostic difficulty and with or without a high level of assistance from a DDSS. The results showed that the RESP among the measures varied nearly tenfold. The rank-order measures tended to provide the highest RESP values (maximum = 2.14) while appropriateness measures provided the lowest RESP values (maximum = 1.41). The most responsive measures were rank-orders of the correct diagnosis within the top 5 to 10 listed diagnoses.