OBJECTIVE: To examine the relationships among different performance scores for each of four diagnostic decision support systems (DDSSs). Design: Intercorrelations among seven performance scores on a set of 105 cases for each of four DDSSs (DXplain, Iliad, Meditel, QMR) were computed. METHODS: The performance scores for each case reflected: 1) presence or absence of the case diagnosis in the DDSS knowledge base; 2) presence or absence of the correct diagnosis anywhere on the DDSS diagnosis list; 3) presence or absence of the correct diagnosis in the top ten diagnoses; 4) relevance of the DDSS diagnosis list; 5) comprehensiveness of the DDSS diagnosis list; 6) whether the DDSS suggested additional diagnoses to the experts' list; and 7) the length of the DDSS diagnosis list. Results: For all DDSSs, the two Correct Diagnosis scores (top ten and total list) were significantly related: 1) to the presence of the correct diagnosis in the knowledge base; 2) to the Comprehensiveness score; and 3) to each other. There were significant differences among the four DDSSs on the magnitude and/or direction of the relationships between: 1) the two Correct Diagnosis scores; 2) the Relevance and Length scores; and 3) the Relevance and Additional Diagnoses scores. CONCLUSION: The production of a correct diagnosis for a given case is not related to the number of diagnoses suggested by the DDSS and, across different DDSSs, is not consistently related to other measures of performance. These data indicate that multiple measures are needed to fully describe the performance of a DDSS.