TALC 2016

We’ll be presenting the following panel at TALC in Giessen, Germany, in July 2016.

Developing a First Year Composition L2 Writing Corpus and Repository

A number of student academic writing corpora (e.g., ICLE, MICUSP, BAWE) have been developed in the past few decades, showing the interest in and importance of representing this domain of language use. These corpora have been used for countless research studies, as illustrated by the extensive bibliography on the CECL and LCA websites.

Our project, the Purdue Second Language Writing corpus (PSLW), builds on this base but aims to represent the writing produced by first year international students in the U.S. in composition courses. Such courses are provided at virtually every university in the U.S., but to date no large-scale projects have been completed. Our corpus currently includes 4,012 texts (3,472,260 words) representing 5 different genres (literacy narrative, proposal, annotated bibliography, interview report and argumentative essay), and we are currently processing a comparable amount of texts to be available by Summer 2016. The corpus contains three drafts of each assignment. The samples are annotated with writers’ TOEFL scores, nationality, and gender, among other characteristics.

Importantly, the corpus is part of a larger interdisciplinary project that represents a collaboration among students and faculty from both applied/corpus linguistics and composition studies, called CROW (Corpus and Repository of Writing). Two main features of this larger project include the development of an online interface where scholars can eventually submit their own texts, and the inclusion of pedagogical artifacts that accompany the production of the texts, including syllabi, assignment sheets, pre-writing readings, and schema building activities.  Providing these additional materials sheds light on how the texts in the corpus are developed and shaped by these instructor-designed texts. We believe that such efforts are an important way to advance corpus linguistic and language teaching research.

Our presentation will focus on two strands: the methodology for developing this new kind of corpus project, and research that has been conducted using our corpus. In terms of methodology, we will briefly cover our corpus compilation process, but focus more on the interdisciplinary practices used to guide the development of the online platform and integration of corpus texts and artifacts. We will provide a discussion of several best practices from usability design: 1) the development of persona scenarios (e.g., novice international graduate student instructor); 2) environmental scans of corpus and repository websites (e.g., MICUSP, COCA and Pedagogy Toolkit).

A number of research projects have been conducted using the PSLW corpus. We will report on the findings of one of these studies, which investigated the use of reporting verbs in students’ literature reviews. Using a framework drawing on the work of Francis, Hunston, and Manning (1996), Charles (2006), and Friginal (2013), the study showed that although L2 writers in the corpus used many verbs in the semantic categories of argue and show, mostly for textual attribution, they also employed more think verbs than advanced L1 student writers, particularly for making general statements or to express their own opinions. After discussing our research findings, we will end the presentation by offering implications of our project for corpus development and research in general.


Swatek, A., Banat, H., Staples, S. (2016, July). Developing First Year Composition L2 Writing Corpus: Research, Pedagogy and Teacher Training. Presentation at the 12th Teaching and Language Corpora Conference. Giessen, Germany.


Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes 25(3). 310–331. doi:10.1016/j.esp.2005.05.003. Retrieved from http://www.sciencedirect.com/science/article/pii/S0889490605000529 

Francis, G., Hunston, S., &  Manning, E. (Eds.). (1996). Collins COBUILD Grammar Patterns 1: Verbs. Amsterdam: John Benjamins Publishing Company.

Friginal, E. (2013). Developing research report writing skills using corpora. English for Specific Purposes 32(4). 208–220. doi:10.1016/j.esp.2013.06.001. Retrieved from http://www.sciencedirect.com/science/article/pii/S0889490613000392 



The Corpus & Repository of Writing, an inter-institutional and inter-disciplinary research project building a corpus of student writing articulated with a repository of pedagogical artifacts.

Leave a Reply