Pneumococcal surface protein A (PspA) is a serologically variable protein of Streptococcus pneumoniae. Twenty-four diverse alleles of the pspA gene were sequenced to investigate the genetic basis for serologic diversity and to evaluate the potential of diversity to have an impact on PspA's use in human vaccination. The 24pspA gene sequences from unrelated strains revealed two major allelic types, termed 'families,' subdivided into clades. A highly mosaic gene structure was observed in which individual mosaic sequence blocks in PspAs diverged from each other by over 20% in many cases. This level of divergence exceeds that observed for blocks in the penicillin-binding proteins of S. pneumoniae or in many cross-species comparisons of gene loci. Conversely, because the mosaic pattern is so complex, each pair of pspA genes also has numerous shared blocks, but the position of conserved blocks differs from gene pair to gene pair. A central region of pspA, important for eliciting protective antibodies, was found in six clades, which each diverge from the other clades by >20%. Sequence relationships among the 24 alleles analyzed over three windows were discordant, indicating that intragenic recombination has occurred within this locus. The extensive recombination which generated the mosaic pattern seen in the pspA locus suggests that natural selection has operated in the history of this gene locus and underscores the likelihood that PspA may be important in the interaction between the pneumococcus and its human host.