The scientific replicability crisis has recently focused on bone surface modification (BSM) analysis, which underlies zooarchaeological and anthropological conclusions about the ecology and evolution of tool-assisted carcass consumption behavior. We review a recent blind test of inter-analyst correspondence in morphometric analysis of experimentally generated butchery marks that advocates algorithmic methods for diagnosing and measuring BSM in an effort to standardize methodology and minimize inter-analyst error (Domínguez-Rodrigo et al., 2017. Use and abuse of cut mark analyses: The Rorschach effect. Journal of Archaeological Science, 86, 14–23. https://doi.org/10.1016/j.jas.2017.08.001). This study overstates concern about the inaccuracy of BSM measurement and interpretation, concluding that BSM analysis is a subjective, non-scientific endeavor. Based on a minimally described sample of cut marks, it measures variables that involve inherent inaccuracy and subjectivity and overlooks how the contexts of experimental sample generation – particularly the difference between immanent and configurational processes – differentially affect cut mark morphometrics. We illustrate this discussion with experimental taphonomic examples focused on analytical context including sample construction and control over factors that affect cut mark cross-sectional size. Our analysis suggests the relationship between tool attributes and cut mark morphology is not generalizable to all experimental and archaeological butchery contexts. We show that our experimental samples capture metric variability observed in archaeological cut marks, but that intentionally incised marks and realistic defleshing marks differ in width and depth. Further, when controlling for factors that impact cut mark size including animal size class, tool type, butcher experience, and density across bone portions, overlapping cut mark widths and depths produced by phonolite and ignimbrite flakes lead to poor classification of marks according to causal flake material, which casts doubt on the ability to discriminate cut marks made by different materials. We build datasets that include diverse experimental contexts and suggest that meta-analysis can disentangle how multiple configurational factors contribute to cut mark morphometric attributes. Ultimately, progress in BSM analysis rests on inter-analyst replicability, which must be preceded by clear discussion of all parts of the inferential loop – from the design of experiments that generate actualistic analogues, to their use in supporting archaeological arguments. Otherwise, problematic expert knowledge traditions may mask arguments from authority in sophisticated methodological language and under-reported experimental context.