Corpus and Repository of Writing

The Crow team has collected a smaller group of members to focus on various grants we will be applying for this semester. During the second week of classes here at Purdue, the grants team met to establish our grant strategy, and  distribute grant work.

We decided we would meet as a large group every two weeks to ensure everyone is on the same page and to review any issues that come up during our two weeks apart. Other than that, individual grant teams will meet as needed to get the work done.

We have started working on four major grants this semester, a CLA Humanities Grant, Humanities Without Walls, a CLA non-laboratory grant, and an SBS Faculty Small Grant from the University of Arizona.

CLA Humanities

This grant funding comes directly from Purdue as an internal grant. The funds, if awarded, will be used to help fund our research efforts. Lindsey and Michelle will be working on this grant. Eventually, I will join their team as well to assist as needed.

Humanities Without Walls Changing Climate Initiative

Humanities Without Walls is spread throughout the Midwest to different humanities institutes, Purdue being one of them. Its main goal is to encourage an increase in the visibility of humanities. Shelley and Bradley will focus on this grant for their Changing Climate initiative. Eventually, Hadi will join to assist as needed. We’ll be partnering with Michigan State on this grant, too, which is exciting!

CLA Non-Laboratory

This is another internal grant from Purdue. Its goal is to help research groups to upgrade their equipment, software, and database access, among other things. If awarded, We will be using this money to purchase upgraded monitors, webcams, and headphones for our resource room. Hadi and myself will be focusing on this grant.

University of Arizona Faculty Small Grant

This is an internal grant through the University of Arizona. This grant will be used to aid in recruitment of participants for our study, as well as to help fund graduate student wages for processing and de-identifying files as well as checking part of speech tags that will be automatically added to the corpus. Shelley Staples will be focusing on this grant. 

It is now the fourth week of classes, and our grants team is off to a great start. Teams are meeting, drafts are being compiled, and budgets are being adjusted. All of the grants listed above have different deadlines, so we expect to continue working at our current pace and continue distributing tasks as needed to the grants team. As these grants come to completion, we will be on the lookout for more to apply for.

Many research teams at large universities slow down their progress in the summer to accommodate travel schedules and personal research agendas. With our ten person team filled with graduate and undergraduate students, the slowing down risk could be even higher, but our Crow team has maintained the same stamina we had all year. Here are some of the highlights from a busy and rewarding three months.

Academic Conferences

In May several members of our team, Michelle, Zhaozhe, Bradley, Shelley, and Lindsey, piled into a minivan and travelled to Rochester, NY to attend Computers and Writing 2016 Conference hosted by St. John Fisher College. The nine hour drive was filled with snacks, coffee and bad jokes from Dr. Bradley Dilger. Their roundtable, “Boundary Work: Designing a Composition Archive for Research and Mentoring Across Disciplines” had great attendance and participation to help Crow discuss the implications of the research we’re doing. For more information, check out our blog entry on the C&W conference.

crowattalc2016_2July included international travel to the 12th Teaching and Language Corpora Conference in Giessen, Germany for team members Hadi, Ola, and Shelley. These three are seasoned international travellers and made it to Germany without a hitch for their panel, “Developing a First Year Composition L2 Writing Corpus and Repository.” Despite being in the last time slot on the last day of the conference, the team members had strong attendance at their session and a lively conversation with attendees about the Purdue Second Language Writing Corpus (PSLW) and its ties with Crow. More information on their time in Germany will be available on the site soon.

Professional development and continuing education is one of the core foundations for Crow, so we actively write proposals and look for opportunities to share our research and team management practices with audiences around the globe.  We’ll keep you updated on our appearances at conferences around the country and other upcoming opportunities for the Crow team on our conference page.

Individual members also found their way to conferences, institutes, and employment opportunities this summer in California, US, Ann Arbor, MI, US, Atlanta, GA, US, and Hanover, NH, US to share personal research projects and to network.

Advancing Our Research

With fewer demands from teaching and learning in the summer, the Crow team took the opportunity to develop more “behind the scenes” components for the project. We were able to launch this site this summer with the diligence of Ola and Bradley. With help from discussions at academic conferences and in team meetings, we narrowed down our research goals and developed our first research project direction, an examination of citation practices in L2 writers based on our growing corpus. The arduous task of completing a large scale multi-institution IRB application for the Crow project began (we are very hopeful that this will be approved soon!**).  And several team members spent more than 40 hours de-identifying student texts from the 2015-2016 academic year in preparation for their inclusion in the corpus later this year.

Research projects require funding, so we also invested quite a bit of time this summer searching for internal and external grants we can apply for in the next year. We’ve had pretty good success with College of Liberal Arts research grants so far, and we’re grateful for their support. We’re also happy to say we’ve identified a partner for an inter-institutional grant we’ve started working on and will submit this fall.

Crow Changes and Evolutions

During the summer the Crow team experienced a few changes.

Our dear Dr. Shelley Staples said farewell to the midwest and Purdue University and went to the southwest to start a new Assistant Professor position at the University of Arizona in the English Applied Linguistics program. In Tucson, Shelley will continue to develop her fascinating research in corpus linguistics, mentor and guide students, and create a branch Crow for the first inter-institutional link. Purdue Crow looks forward to lots of Skype sessions and expanding our team.

Louis Wyatt graduated from Purdue with a Bachelor of Arts degree in Professional Writing this May. He will begin an internship with Bleacher Report and hopes to find a more permanent position soon. His contributions to Crow’s usability potential were greatly appreciated and we look forward to watching his bright future.

Samantha Pate joined our Crow team to help with development, web work, and grant writing. As a rising Professional Writing star, she gives us a lot of help towards developing the backend of the project and overall design and interface of the site.

And Dr. Bradley Dilger began a new administrative appointment as the Director of Introductory Composition for Purdue University. In this new position he will coordinate and direct more than 100 sections of first year composition with the help of two assistant directors and several other faculty members, graduate students, and staff that compose the Introductory Composition at Purdue (ICaP) writing program. Although his email box and coffee consumption will grow exponentially, his time and dedication to Crow will not falter. We all look forward to watching him bring innovation and energy to ICaP.

Coming Soon

As Crow enters the fall, we will continue to finesse our research project on citation practices and the prototyping will begin for our site. Grant writing will be a big focus as well. The transitions in the project will continue, but because of the shared leadership model and emphasis on professional development, Crow looks forward to more growth and challenges.


**Update: Crow received the excellent news of IRB approval on Friday August 19th. Congrats to the research team members in charge of this awesome accomplishment. Another milestone reached!

Tagged with: , , , ,

Terrence, Michelle, Shelley, and Bradley had an excellent week at Computers & Writing 2016 in Rochester. We got to explore the interdisciplinary nature of the Crow project through the workshop we attended, our presentation, and several other interesting panels. Lots of good thinking about the relationship between corpus linguistics, pedagogy, mentoring, and building a sustainable archive.

Our conference began with the Ride2CW celebration at the Tap and Mallet — great food, good beer, and smart conversations already starting. The next morning, Bradley and Bill Hart-Davidson rode along the Erie Canal, which was just two miles from host St. John Fisher College. Yay, Ride2CW!

The four of us attended Ryan Omizo and Hart-Davidson’s workshop on computational rhetoric, where we could start imagining what data representations might look like for Crow. We developed some great questions about our data structure and the multiple users for whom we are designing.

We attended a variety of sessions which were interesting and relevant to our project. In A5, we heard Naomi Silver and others from the Sweetland Center for Writing talk about their collaborative processes. We liked seeing what Erin Trauth, Joe Moxley, and Norbert Elliot were doing with MyReviewers data on an NSF-funded project, and we’ll definitely be following up with them. We’re hoping to make it to Writing Analytics, Data Mining and Student Success in January 2017.

Session G3, which featured Ben Miller, Jason Palmeri, and Ben McCorkle, offered in an-depth look at two projects: Palmeri and McCorkle’s ongoing investigation of English Journal, which goes back 100 years, and Miller’s work with rhetcomp dissertations. Excellent as presented and in Twitter backchannel.

Our talk was session D2. We were pleased by the attendance and the conversation which followed. Michelle built a Storify which features Nick Carbone’s live-tweeting (thanks, Nick!) and some of the questions, too:

  • Hart-Davidson asked what our minimum value proposition will be: what will provide short term results as we build Crow from the ground up? We agreed it’s PSLW, which is already helping us publish results in journals and at conferences.
  • Elliot suggested working with N-grams, or strings of words that may perform certain rhetorical functions (e.g., according to the; the first article).
  • Cheryl Ball asked to hear more about our “deidentification parties” and our methods for digital collaboration. Yay, Basecamp!

From the repeated names here, we realized there aren’t too many people working in the computational rhetorics, nerd data crunching, whatever you want to call the corner of the field we’re working in.  That’s probably the reason we heard, in the panels we attended, at least as many references to scholars in digital humanities but outside rhetoric and composition. Just not enough voices inside the field. We’re particularly happy to note that Crow will add a few more women to the mix.

Driving back, we debriefed and finalized our summer plans. Shelley, Terrence, and Michelle worked in Basecamp and Google Docs while Bradley drove, and it took almost seven hours for the four of us to talk through our conference experiences. With that work done, and about two hours of driving left, we started getting a little chirpy. Then we saw on Twitter that some conference-goers were still in the airport. And we realized there were strong positives to driving!

Next year, the conference will be June 1–4 at the University of Findlay in Ohio, less than four hours away. So we’ll probably have a Crow team there again. If Bradley trains enough, it’s only a two day bike ride…

Tagged with: , , , , ,

“The Design and Research Potential of Crow for Language Research and Teaching”

by Sherri Craig and Jie (Wendy) Gao

The 2016 Purdue Languages and Cultures Conference (PLCC) was the first time the School of Languages and Cultures partnered with the Second Language Studies graduate program to host an interdisciplinary three day conference. This unique structure offered a perfect opportunity for Crow to have its inaugural presentation, titled “The Design and Research Potential of Crow for Language Research and Teaching” provided by Sherri Craig and Jie (Wendy) Gao.
Listed in the conference program as part of a corpus linguistics panel, the presentation focused on answering a few questions: What is corpus? What is Crow? What are the previous research projects and future research opportunities related with Crow? How is the Crow project progressing?

At the date of the presentation, March 6, 2016, Crow was still in its beginning stages. Therefore, much of the presentation revealed a preliminary introduction of the whole project, and reported all the work the team had completed so far, including the environmental scans and persona and scenario design work. During the PLCC presentation, Sherri and Wendy revealed Crow’s ties to three previous projects rooted in the Purdue Second Language Studies program and Rhetoric and Composition program: COIN, PSLW, and the 2014-15 ICaP Assessment. Each of these previous projects contained elements of Crow’s new goals. COIN, now a defunct program, attempted to gather pedagogical materials for an online repository. PSLW is an active corpus of texts from second language writers containing over 3.4 million words. And the 2014-15 ICaP Assessment, led by Dr. Jennifer Bay, began to evaluate the pedagogical needs of writing instructors by gathering student texts and teaching materials. Despite the strength of the previous programs, Crow was designed to bring the interests of the SLS program and RC program together to develop an online repository and corpus for a broader audience.

After discussing the overview and related projects, the Sherri and Wendy discussed the environmental scans performed on MICUSP and Sketch Engine before discussing the 4 personas that inspire the user design.

Overall the PLCC presentation went off without a hitch. During Q&A the audience members were very interested in how to make better use of corpus in the future. One listener even asked if they could use Crow for their own work and courses. Others asked quite a lot of technical questions about the design of the future site and the development of the project and corpus. With the help of Dr. Staples and Dr. Dilger in the audience, all the questions were responded to and excitement for Crow spread. Everyone in attendance, including Sherri and Wendy, were strongly motivated to see how this project will develop as progress continues.

Tagged with: , , , ,

We’re in Rochester, NY for Computers & Writing 2016. We attended the computational rhetorics workshop facilitated by Ryan Omizo and Bill Hart-Davidson, and presented in session D2, “Boundary Work: Designing a Composition Archive for Research and Mentoring Across Disciplines.” That’s Friday, 5/20, 4:30 to 5:45pm, in Nursing 102.

We described our approach to developing Crow in five short talks:

  • Shelley Staples introduced our team and share our project goals.
  • For those C&W attendees not familiar with corpus linguistics, Terrence Wang offered an introduction.
  • Ashley Velázquez, reading for Lindsey Macdonald, outlined some of the pedagogical rationale for Crow, and describe some possibilities.
  • Michelle McMullin described how our approach to infrastructure draws on scholarship in professional communication.
  • Finally, Bradley Dilger concluded our panel by saying more about our approach to sustainable collaboration.

Here’s our session handout and slide deck. Thanks to those who attended!

We have more to say about the conference in another post.

Tagged with: , , , , , , ,

Finals week had just begun here at Purdue when the Crow team gathered in Heavilon Hall to kick off our summer projects. We met for some early morning sweets and some very much needed coffee to get our brains working before diving into our work. The team was assigned various tasks and dispersed. After touching base with other team members to ensure that everyone was on the same page, the bulk of the work was dedicated to de-identifying previously collected data.  

Crow team members de-identifying textsThough some of us probably could have used a bit more coffee.

Crow is built on the Purdue Corpus of Second Language Writing (PSLW), which is a collection of student-produced  documents from the ENGL 106i courses here at Purdue. Before uploading  these documents into the corpus, the documents must be de-identified. So, we split up into groups and we each tackled a group of documents. We reviewed  each document and redacted any information that could lead to the identification of the writer, including any names, locations such as hometown or dorm halls, specific course names, and specific professor names. Rather than just deleting the identifying word or words, we replaced each one  with angle brackets and the category we were replacing. For instance, a name such as “Jordan” is  replaced with “<name>”. This prevents any confusion that missing words may cause.

De-identifying the documents, though tedious and mind-numbing, is an important step in our process. At this point, we want to look for themes that spread across multiple documents, not focus on certain documents individually. That being said, the specific, identifying detail that writers may have included in their assignments become irrelevant. We also want to work to ensure that we are not creating any biases based on preexisting knowledge of who the writer is of any of the documents we are examining.

Even though we have a lot left on our to-do list, we are excited to dive in and get to work on our summer projects, and we are looking forward to the progress Crow will make during the upcoming months! We’ll be presenting at Computers & Writing 2016, and we have a lot of prototyping and design work planned. Time to get some more coffee!

Tagged with: , , , ,

We’ll be presenting the following panel at TALC in Giessen, Germany, in July 2016.

Developing a First Year Composition L2 Writing Corpus and Repository

A number of student academic writing corpora (e.g., ICLE, MICUSP, BAWE) have been developed in the past few decades, showing the interest in and importance of representing this domain of language use. These corpora have been used for countless research studies, as illustrated by the extensive bibliography on the CECL and LCA websites.

Our project, the Purdue Second Language Writing corpus (PSLW), builds on this base but aims to represent the writing produced by first year international students in the U.S. in composition courses. Such courses are provided at virtually every university in the U.S., but to date no large-scale projects have been completed. Our corpus currently includes 4,012 texts (3,472,260 words) representing 5 different genres (literacy narrative, proposal, annotated bibliography, interview report and argumentative essay), and we are currently processing a comparable amount of texts to be available by Summer 2016. The corpus contains three drafts of each assignment. The samples are annotated with writers’ TOEFL scores, nationality, and gender, among other characteristics.

Importantly, the corpus is part of a larger interdisciplinary project that represents a collaboration among students and faculty from both applied/corpus linguistics and composition studies, called CROW (Corpus and Repository of Writing). Two main features of this larger project include the development of an online interface where scholars can eventually submit their own texts, and the inclusion of pedagogical artifacts that accompany the production of the texts, including syllabi, assignment sheets, pre-writing readings, and schema building activities.  Providing these additional materials sheds light on how the texts in the corpus are developed and shaped by these instructor-designed texts. We believe that such efforts are an important way to advance corpus linguistic and language teaching research.

Our presentation will focus on two strands: the methodology for developing this new kind of corpus project, and research that has been conducted using our corpus. In terms of methodology, we will briefly cover our corpus compilation process, but focus more on the interdisciplinary practices used to guide the development of the online platform and integration of corpus texts and artifacts. We will provide a discussion of several best practices from usability design: 1) the development of persona scenarios (e.g., novice international graduate student instructor); 2) environmental scans of corpus and repository websites (e.g., MICUSP, COCA and Pedagogy Toolkit).

A number of research projects have been conducted using the PSLW corpus. We will report on the findings of one of these studies, which investigated the use of reporting verbs in students’ literature reviews. Using a framework drawing on the work of Francis, Hunston, and Manning (1996), Charles (2006), and Friginal (2013), the study showed that although L2 writers in the corpus used many verbs in the semantic categories of argue and show, mostly for textual attribution, they also employed more think verbs than advanced L1 student writers, particularly for making general statements or to express their own opinions. After discussing our research findings, we will end the presentation by offering implications of our project for corpus development and research in general.


Swatek, A., Banat, H., Staples, S. (2016, July). Developing First Year Composition L2 Writing Corpus: Research, Pedagogy and Teacher Training. Presentation at the 12th Teaching and Language Corpora Conference. Giessen, Germany.


Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes 25(3). 310–331. doi:10.1016/j.esp.2005.05.003. Retrieved from 

Francis, G., Hunston, S., &  Manning, E. (Eds.). (1996). Collins COBUILD Grammar Patterns 1: Verbs. Amsterdam: John Benjamins Publishing Company.

Friginal, E. (2013). Developing research report writing skills using corpora. English for Specific Purposes 32(4). 208–220. doi:10.1016/j.esp.2013.06.001. Retrieved from 


Tagged with: , , , , ,

In March 2017, three conferences Crow researchers are very interested in will be held consecutively in the Pacific Northwest. (Four if you count ATTW!) We’re excited about the opportunity to attend, present (we hope), and participate in workshops and other ways. Earlier this week, we submitted two proposals for CCCC 2017. We’ve included summaries below.

Hope to see you in Portland and Seattle!

Cultivating Writing Research via Corpus and Computational Collaboration

Bill Hart-Davidson & Ryan Omizo will join Shelley Staples and Lindsey Macdonald for this panel. Here’s the opening statement:

In March 2017, CCCC will be joined in Portland by AAAL, the conference of the American Association for Applied Linguistics. We take this opportunity to highlight the value of collaboration between researchers who will be attending one, but likely not both, of these conferences, and unfortunately, crossing paths in few ways. The corpus linguistics methods common in applied linguistics can bring quantitative elements to empirical research in rhetoric and composition, including attention to demographic issues and diverse genres. Rhetorical research, conversely, offers corpus researchers valuable insights into extra-textual features and contextual influences. This panel explores possibilities for collaborative writing research by demonstrating the value of this interdisciplinary work. We offer an overview of the benefits of corpus and computational methods, then present case studies of two projects which integrate computational methods and corpus linguistics with rhetoric and composition. We conclude with a brief panel discussion of takeaways for interdisciplinary collaboration, then invite conversation.

Promoting RAD Writing Research through Inter-Institutional Collaboration

Michelle McMullin, Terrence Wang, and Bradley Dilger proposed this session. Here are some excerpts from the proposal:

Empirical research in composition and rhetoric has become more common. Diverse research projects investigate all areas of the field, including writing transfer, undergraduate writing majors, and the literacies of working class and underrepresented minorities. But scholar-teachers at all levels still struggle to implement lessons from published research at their own institutions, and to explain the relevance of research to administrators…. In this presentation, we describe how research designed as inter-institutional from its inception has embedded attention to diverse research outcomes, the development of sustainable infrastructures, and the lifecycle model of scalable user-centered development. Our project brings the methods of corpus linguistics to rhetoric and composition, and vice-versa, creating a web-based archive for research and professional development. By embedding an interdisciplinary approach to collaboration from the start, we have developed a project that considers the strengths and contributions of each partner for an effective collaboration model that best serves the needs of all stakeholders.

Tagged with: , , , , , , ,

At the end of our first academic year, the Crowbirds got together at Bradley’s house for a picnic, barbeque, and conversation. Madelyn and Amelia decorated, everyone brought wonderful food, and we had a great time — as you can see!


We are very proud of the progress we’ve made this year. A lot of our team members are traveling, and we’ll miss them. We look forward to a productive summer.

Tagged with: , ,