Objectives: This study examines the extent of selection biases identified in the process of linking Medicaid claims with evidence of pregnancy to vital records. Methods: Two years of Medicaid claims were scanned to identify pregnancy-related diagnoses and procedures. Information on 55,764 Medicaid recipients was provided to the Division of Health Statistics, which linked the information to vital records data on a range of identifying characteristics. Claims were then clustered by date and then into episodes of care surrounding the birth date of the infant. We identified 38,222 pregnancy episodes matched to vital records; 8,474 episodes unmatched to vital records that appeared to terminate before a delivery; and 5,278 episodes that appeared to include a delivery but did not match to vital records. The characteristics of matched episodes and unmatched episodes and the characteristics of matched episodes with and without delivery claims are compared. Results: Unmatched episodes spanned fewer weeks than matched episodes, included more diagnostic indicators of elevated risk, and occurred more frequently in more impoverished populations. Among the matched records, 13% did not include claims for delivery services. These episodes occurred more frequently among Hispanic women, women delivering out of hospitals and women with preterm births and infant deaths. Conclusions: The results provide evidence, as other studies have demonstrated, that matching Medicaid claims and vital records data is feasible. However, the matched analytic data set does tend to under-represent the outcomes of high-risk pregnancies. An additional source of selection bias can be avoided by using evidence of pregnancy as the Medicaid index for matching against vital records, rather than using only index cases with evidence of delivery. © Springer Science+Business Media, LLC 2008.