BACKGROUND: Machine-learning can elucidate complex relationships/provide insight to important variables for large datasets. This study aimed to develop an accurate model to predict neonatal surgical site infections (SSI) using different statistical methods. METHODS: The 2012-2015 National Surgical Quality Improvement Program-Pediatric for neonates was utilized for development and validations models. The primary outcome was any SSI. Models included different algorithms: full multiple logistic regression (LR), a priori clinical LR, random forest classification (RFC), and a hybrid model (combination of clinical knowledge and significant variables from RF) to maximize predictive power. RESULTS: 16,842 patients (median age 18 days, IQR 3-58) were included. 542 SSIs (4%) were identified. Agreement was observed for multiple covariates among significant variables between models. Area under the curve for each model was similar (full model 0.65, clinical model 0.67, RF 0.68, hybrid LR 0.67); however, the hybrid model utilized the fewest variables (18). CONCLUSIONS: The hybrid model had similar predictability as other models with fewer and more clinically relevant variables. Machine-learning algorithms can identify important novel characteristics, which enhance clinical prediction models.