The Mantel-Haenszel (MH) chi-square test has been shown to have desirable statistical properties in detecting biased items. However, small sample sizes are still of concern, especially when large differences in ability exist between the focal and reference groups. This is because the accuracy and power of the MH test depends on the range of overlap between the focal and reference groups on raw scores, as well as the total sample size at any particular raw score. The MH procedure is compared with (a) a randomization test and (b) a jackknife test, which make weaker distributional assumptions. The MH chi-square significance levels were found to be extremely robust.