This paper proposes a human-centered interactive framework for automatically mining and retrieving semantic events in videos. After preprocessing, the object trajectories and event models are fed into the core components of the framework for learning and retrieval. As trajectories are spatiotemporal in nature, the learning component is designed to analyze time series data. The human feedback to the retrieval results provides progressive guidance for the retrieval component in the framework. The retrieval results are in the form of video sequences instead of contained trajectories for user convenience. Thus, the trajectories are not directly labeled by the feedback as required by the training algorithm. A mapping between semantic video retrieval and multiple instance learning (MIL) is established in order to solve this problem. The effectiveness of the algorithm is demonstrated by experiments on real-life transportation surveillance videos. © 2009 IEEE.