The analysis of large and complex databases poses many challenges. Such databases arise in health-services, electronic medical records, insurance, and other commercial data sources where both the number of observations and variables can be enormous. The problems are particularly acute in genomics and proteomics where the number of variables is typically much higher than the number of observations. Extant methods seek to balance the demands of making efficient use of the data with the need to maintain the flexibility required to detect complex relationships and interactions. To overcome some limitations of current methods, a novel analytical tool, Multivariate Neighborhood Sample Entropy (MN-SampEn) is introduced. It is a generalization of Sample Entropy to multivariate data that inherits many of Sample Entropy's desirable properties. In principle, it selects significant covariates without reference to an underlying model and provides predictions similar to those of k-Nearest-Neighbor methods, with fewer covariates required. However, adaptation to multivariate data requires that several additional optimization issues be addressed. Several optimization strategies are discussed and tested on a set of MALDI mass spectra. With some optimization strategies, MN-SampEn identified a reduced set of covariates and exhibited lower predictive error rates than k-Nearest Neighbors. © 2011 Elsevier Inc.