Information entropy is introduced as penalty function and imposed to the back propagation cost function. After training, more organized hidden unit activation patterns are obtained and few hidden units respond for each input sample. The scale of the neural network is reduced after using the pruning method, and its generalization performance and computational efficiency are improved at the same time.