The messages have been tagged with Part of Speech tags, using TreeTagger, and are in a basic XML format.
Due to copyright reasons, we can't upload the full corpus to our website: if you are interested in working with the corpus please contact us directly via email.
A small preview can be viewed hier.