Abstract: Conditional maximum entropy models provide a unified framework to integrate arbitrary features from different knowledge sources and have been successfully applied to many natural language processing tasks. Feature selection methods are often used to distinguish good features from bad ones to improve model performance. The selection of features in traditional methods is often performed based on different strategies before or along with feature weight estimation, however, weights themselves should be the only factor to measure the importance of features. This study proposes a new selection method based on divide-and-conquer strategies and well-trained feature spaces of small sizes. Features are divided into small subsets, on each of which a sub-model is built and its features are judged according to their weights. The final model is constructed based on merged feature space from all sub-models. Experiments on part of speech tagging show that this method is feasible and efficient.