Background: Electronic health record (EHR) databases are a promising platform for clinical research using real-world data. However, information on potential limitations of these data sources is lacking. We sought to understand how data visualization might be used to identify data inconsistencies and the applicability of previously validated claims-based algorithms used to identify patients with metastatic breast cancer (MBC). Methods: This retrospective study utilized ASCO’s CancerLinQ Discovery database derived from EHR data. Subjects included women ≥18 years treated for MBC diagnosed ≥1980. Subjects with MBC were identified using two billing codes for metastasis on separate dates following primary breast cancer diagnosis. Treatment course sequences were visualized. Patients were represented by a horizontal bar on the Y-axis. Treatments were displayed using colored bars (blue: chemotherapy, red: endocrine therapy, green: HER2 targeted, orange: novel therapy) with time of treatment on the X-axis. Visualizations were qualitatively evaluated, and treatment patterns inconsistent with clinical practice were identified. Results: We identified 4,760 women treated for MBC using billing codes for primary breast cancer diagnosis and distant metastasis. Most patients (96%) had a primary breast cancer diagnosed in 2000 later. Treatment patterns inconsistent with clinical practice identified using the visualization technique included: 1% of patients received adjuvant chemotherapy continuously for ≥1.5 years, suggesting missed coding for metastatic disease; 5% of patients did not receive any treatment in the year following metastasis, suggesting the billing code may have been used in workup and not for confirmed metastatic disease. Among patients with MBC, 50% identified as HR+ across all records had not received hormone therapy, while 39% identified as HR- across all records received hormone therapy. Conclusions: Because previously validated algorithms may not translate well to EHR databases, quality auditing should always be performed. The proposed data visualization can be used for improving algorithms, qualitatively identifying errors, and avoiding biased or inaccurate results.