Forensic author profiling plays an important role in indicating possible
profiles for suspects. Among the many automated solutions recently proposed for
author profiling, transfer learning outperforms many other state-of-the-art
techniques in natural language processing. Nevertheless, the sophisticated
technique has yet to be fully exploited for author profiling. At the same time,
whereas current methods of author profiling, all largely based on features
engineering, have spawned significant variation in each model used, transfer
learning usually requires a preprocessed text to be fed into the model. We
reviewed multiple references in the literature and determined the most common
preprocessing techniques associated with authors' genders profiling.
Considering the variations in potential preprocessing techniques, we conducted
an experimental study that involved applying five such techniques to measure
each technique's effect while using the BERT model, chosen for being one of the
most-used stock pretrained models. We used the Hugging face transformer library
to implement the code for each preprocessing case. In our five experiments, we
found that BERT achieves the best accuracy in predicting the gender of the
author when no preprocessing technique is applied. Our best case achieved
86.67% accuracy in predicting the gender of authors.