The corpus has been annotated for various features.
The tags and their meanings are the following:
- ann: abbreviations or ‘for sale’ equivalents (tbe, je vends, vds)
- bon: use of an evaluative attribute at the very beginning of the listing
- ego: use of je
- stn/sty: non-standard or standard usage of past participles agreement or negation
- pre: presentatives (il y a, c’est)
- vst: vraiment as a stance marker ("it’s really nice")
- emo: emoticons
- enc: use of bonnes enchères (happy bidding)
- imp: most frequent imperative forms ( hèsitez, consulter, regardez)
- att: evaluative attributes (not at the beginning of the listing)
In addition to these tags which are used consistently throughout all four subcorpora, the first subcorpus (2005) contains extra tags:
- acc: accents which are missing or are non-standard
- ang: anglicisms
- con: contact details
- inf: information
- lex: informal lexical items
- ort: orthographical ‘mistakes’
- pub: marketing language
- slo: use of slogans
- syn: syntax, topicalisation