Prolyl 4-hydroxylase, an α2β2 tetramer, plays a central role in collagen synthesis as it catalyzes the formation of 4-hydroxyproline residues by the hydroxylation of proline in X-Pro-Gly sequences. We report here that the human gene for the catalytically important α subunit is more than 69 kilobase pairs and consists of 16 exons. The exons that encode solely protein sequences vary from 54 to 240 base pairs (bp), and the introns vary from 750 to more than 16,000 bp. The 133 bp of 5'-untranslated sequences of the mRNA are coded by two exons, and these sequences contain inverted repeats with a potential for stem-loop formation, which may be involved in translational control of the expression of this gene. The 5'-flanking region contains a TATA motif at -29 relative to the major transcription site but no CCAAT motif. The 5'-flanking region and the downstream sequences contain several motifs that may act as binding sites for various transcription factors. Evidence has previously been reported for a mutually exclusive alternative splicing of RNA transcripts of this gene. The present data indicate that the mutually exclusive sequences found in the mRNAs are coded by two consecutive, homologous 71-bp exons 9 and 10. These exons are identical in their first 5 bp and the overall identity between them is 61% at the nucleotide level and 58% at the level of the coded amino acids. Both types of mRNA were found to be expressed in all of the tissues studied, but in some tissues the type coding for exon 9 or 10 sequences was more abundant than the other type.