
LEWC
LEWC stands for Luxembourgish Email Word-Corpus; it is a freely available and searchable online word corpus of private emails written in Luxembourgish. Thanks to Amir Zeldes’ help, the corpus can be found on the ANNIS platform.
Selection process
Initially, 56 successful users of Luxembourgish (L1 and L2) were contacted. Preselection was done on the basis of having sent at least one Luxembourgish email to me. All potential participants were contacted on Monday, 03 April 2006 and from these, 20 participants agreed to have their emails anonymised and the data be made available online to other researchers.
Participants and their emails
LEWC consist of 269 emails and 31,469 tokens written by 8 male and 12 female participants. The male participants’ ages ranged from 26 to 59, with a median age of 28.5; the female participants’ ages ranged from 23 to 58, with a median age of 28. The median age including both genders is 28. The oldest email dates back to 21 June 2003, the latest is from 3 April 2006.
Anonymisation
All participants were guaranteed anonymity of their identity, which means that emails contain no information that directly link to the participants’ identities, including their gender. There are no email header information (from, to, date, subject); only email bodies are included.
Within emails, all names have been anonymised by keeping the initial letter and replacing the rest with "--". For example, "Adam and Eve" becomes "A-- and E--". With locations, the initial letter is kept and the rest is replaced with "==". For example, "Tokyo, Japan" becomes "T==, J==". Sensitive numbers, such as telephone numbers, passwords, and opening times, have been replaced by "∗∗∗". Finally, workplaces and other sensitive material have been replaced by basic information in square brackets. When an email has several people or locations with the same initial letter, they have been indexed with numbers. For instance: J1-- saw J2-- in A1== on his way to A2==.
Searching in LEWC
Go to the Annis platform. In the left-hand menu, tick the box next to the LEWC to ensure that you will be searching in that corpus. In the top part of that menu, type in your query (case sensitive).
1-word search:
"ech"
2-word search:
"well"&"ech".#2
3-word search:
"merci"&"fir"&"deng".#2.#3
At the bottom of the left-hand menu, select how many words you want to have displayed before your queried item in "Context Left". Similarly, "Context Right" displays the indicated amount of words after the queried item. You can either have the results displayed in ANNIS by pressing the "Show Result" button or you can press the "Export" button and select more printer-friendly formats.
Quoting from LEWC
When quoting from LEWC, please include the line numbers (e.g. 4594-4595). This is best achieved by exporting your results using the "GridExporter" exporter. For each search result, ANNIS will display the start line and the end line.
If you are displaying your search results by pressing "Show Results", under each search item you will find the "structure (grid)" expand-collapse button. By expanding that one, ANNIS displays in "line" the line number for all the words displayed.
© 2009 All Rights Reserved - Dr Cédric Krummes
Designed and Developed by Rowan Hotham-Gough
