Corpus and Repository of Writing

We’ll be presenting the following panel at TALC in Giessen, Germany, in July 2016.

Developing a First Year Composition L2 Writing Corpus and Repository

A number of student academic writing corpora (e.g., ICLE, MICUSP, BAWE) have been developed in the past few decades, showing the interest in and importance of representing this domain of language use. These corpora have been used for countless research studies, as illustrated by the extensive bibliography on the CECL and LCA websites.

Our project, the Purdue Second Language Writing corpus (PSLW), builds on this base but aims to represent the writing produced by first year international students in the U.S. in composition courses. Such courses are provided at virtually every university in the U.S., but to date no large-scale projects have been completed. Our corpus currently includes 4,012 texts (3,472,260 words) representing 5 different genres (literacy narrative, proposal, annotated bibliography, interview report and argumentative essay), and we are currently processing a comparable amount of texts to be available by Summer 2016. The corpus contains three drafts of each assignment. The samples are annotated with writers’ TOEFL scores, nationality, and gender, among other characteristics.

Importantly, the corpus is part of a larger interdisciplinary project that represents a collaboration among students and faculty from both applied/corpus linguistics and composition studies, called CROW (Corpus and Repository of Writing). Two main features of this larger project include the development of an online interface where scholars can eventually submit their own texts, and the inclusion of pedagogical artifacts that accompany the production of the texts, including syllabi, assignment sheets, pre-writing readings, and schema building activities.  Providing these additional materials sheds light on how the texts in the corpus are developed and shaped by these instructor-designed texts. We believe that such efforts are an important way to advance corpus linguistic and language teaching research.

Our presentation will focus on two strands: the methodology for developing this new kind of corpus project, and research that has been conducted using our corpus. In terms of methodology, we will briefly cover our corpus compilation process, but focus more on the interdisciplinary practices used to guide the development of the online platform and integration of corpus texts and artifacts. We will provide a discussion of several best practices from usability design: 1) the development of persona scenarios (e.g., novice international graduate student instructor); 2) environmental scans of corpus and repository websites (e.g., MICUSP, COCA and Pedagogy Toolkit).

A number of research projects have been conducted using the PSLW corpus. We will report on the findings of one of these studies, which investigated the use of reporting verbs in students’ literature reviews. Using a framework drawing on the work of Francis, Hunston, and Manning (1996), Charles (2006), and Friginal (2013), the study showed that although L2 writers in the corpus used many verbs in the semantic categories of argue and show, mostly for textual attribution, they also employed more think verbs than advanced L1 student writers, particularly for making general statements or to express their own opinions. After discussing our research findings, we will end the presentation by offering implications of our project for corpus development and research in general.


Swatek, A., Banat, H., Staples, S. (2016, July). Developing First Year Composition L2 Writing Corpus: Research, Pedagogy and Teacher Training. Presentation at the 12th Teaching and Language Corpora Conference. Giessen, Germany.


Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes 25(3). 310–331. doi:10.1016/j.esp.2005.05.003. Retrieved from 

Francis, G., Hunston, S., &  Manning, E. (Eds.). (1996). Collins COBUILD Grammar Patterns 1: Verbs. Amsterdam: John Benjamins Publishing Company.

Friginal, E. (2013). Developing research report writing skills using corpora. English for Specific Purposes 32(4). 208–220. doi:10.1016/j.esp.2013.06.001. Retrieved from 


Tagged with: , , , , ,

In March 2017, three conferences Crow researchers are very interested in will be held consecutively in the Pacific Northwest. (Four if you count ATTW!) We’re excited about the opportunity to attend, present (we hope), and participate in workshops and other ways. Earlier this week, we submitted two proposals for CCCC 2017. We’ve included summaries below.

Hope to see you in Portland and Seattle!

Cultivating Writing Research via Corpus and Computational Collaboration

Bill Hart-Davidson & Ryan Omizo will join Shelley Staples and Lindsey Macdonald for this panel. Here’s the opening statement:

In March 2017, CCCC will be joined in Portland by AAAL, the conference of the American Association for Applied Linguistics. We take this opportunity to highlight the value of collaboration between researchers who will be attending one, but likely not both, of these conferences, and unfortunately, crossing paths in few ways. The corpus linguistics methods common in applied linguistics can bring quantitative elements to empirical research in rhetoric and composition, including attention to demographic issues and diverse genres. Rhetorical research, conversely, offers corpus researchers valuable insights into extra-textual features and contextual influences. This panel explores possibilities for collaborative writing research by demonstrating the value of this interdisciplinary work. We offer an overview of the benefits of corpus and computational methods, then present case studies of two projects which integrate computational methods and corpus linguistics with rhetoric and composition. We conclude with a brief panel discussion of takeaways for interdisciplinary collaboration, then invite conversation.

Promoting RAD Writing Research through Inter-Institutional Collaboration

Michelle McMullin, Terrence Wang, and Bradley Dilger proposed this session. Here are some excerpts from the proposal:

Empirical research in composition and rhetoric has become more common. Diverse research projects investigate all areas of the field, including writing transfer, undergraduate writing majors, and the literacies of working class and underrepresented minorities. But scholar-teachers at all levels still struggle to implement lessons from published research at their own institutions, and to explain the relevance of research to administrators…. In this presentation, we describe how research designed as inter-institutional from its inception has embedded attention to diverse research outcomes, the development of sustainable infrastructures, and the lifecycle model of scalable user-centered development. Our project brings the methods of corpus linguistics to rhetoric and composition, and vice-versa, creating a web-based archive for research and professional development. By embedding an interdisciplinary approach to collaboration from the start, we have developed a project that considers the strengths and contributions of each partner for an effective collaboration model that best serves the needs of all stakeholders.

Tagged with: , , , , , , ,

At the end of our first academic year, the Crowbirds got together at Bradley’s house for a picnic, barbeque, and conversation. Madelyn and Amelia decorated, everyone brought wonderful food, and we had a great time — as you can see!


We are very proud of the progress we’ve made this year. A lot of our team members are traveling, and we’ll miss them. We look forward to a productive summer.

Tagged with: , ,