Corpus and Repository of Writing

The Crow Lab at Purdue University has seen significant growth over the last few months. As of February 2022, five interns are currently working on our project, thanks to support from a EVPRP grant. Studio Hours, or time to work in the lab in Heavilon Hall or over Zoom, occur several times over the course of the week. Fridays are a reflective studio hour time, hosted by Bradley Dilger, for all interns to share their progress with each other and ask questions. The Crow lab is open for interns to use at any time, which is especially useful for one-on-one conferencing.

Though almost all of the work for Crow is collaborative, everyone contributes to a specific set of projects.

I’m Hannah Brostrom, an undergraduate at Purdue. I have been a part of several projects, recently my focus has been on finishing the environmental scans we’ll use to improve the Crow website and platform, and de-identifying documents. I have also collaborated with a fellow intern, Vivek, to improve the de-identification tool (a program we use to help us anonymize the documents we use to build our corpus), and am working on (this!) blog post for the Crow website, and I’ll be doing more writing after Spring Break.

Bradley Dilger, an Associate Professor at Purdue, guides all Crow projects and assists everyone by answering questions and sharing his experience with writing research. He has been leading the grant writing projects, helping Vivek and I with our coding for the De-ID tool, and working with the Distributed Work Team. Outside of Crow, Bradley teaches professional writing and mentors student researchers.

Abby Elkin, an undergraduate researcher at Purdue, has been helping with the Innovation Grant. Additionally, she has been de-identifying and helping me with the conducting environmental scans to improve the Crow website. Abby is also writing a blog post about the improvements Vivek and I are planning for improvements to the De-ID tool, once that work is completed.

Shelton Weech, a graduate teaching assistant at Purdue, has contributed heavily to the Distributed Work team through various writing projects, workshop projects, and the development of future grants. Outside of Crow, Shelton has been preparing for his dissertation conducting interviews with scientific communicators who use social media, all of which he transcribes and analyzes.

Vivek Natarajan, an undergraduate researcher at Purdue, has focused on all aspects of de-identification. He has been doing the de-identification work itself, as well as improving the tool we use for de-identification. Recently, he has fixed an issue where the edited version of a file was saving incorrectly, and he is building some code adjustments to improve user experience while using the tool. 

Anna Shura, an undergraduate researcher at Purdue, has been focused on a variety of projects. She has written several Spotlights and Blog posts, as well as leading Crow’s social media presence. Anna is creating and documenting a Twitter and Web content strategy that includes a blog posting schedule, writing articles, and developing a stronger Twitter profile. She has also worked on organizing Crow’s Google drive, de-identification, and reviewing the processes behind developing the Innovation Grant.

Having lab space in Heavilon allows us to tangibly interact with one another and get help from the people around us when needed. This has been hugely beneficial, especially considering how collaborative Crow’s necessities are. The EVPRP grant has given us the necessary tools to work with each other efficiently and for the Crow interns to move forward not only with the Crow project, but also to learn and grow in their positions.

Dr. Aleksey Novikov has always had a passion for learning languages and reflects as a young child, “[I liked] that I was able to understand things that others didn’t.” Despite growing up and attending college in Russia, Aleksey knew “English was going to be a part of my career” and switched from math, business, and economics to pursue a new degree in translation and interpretation. As a translator, Aleksey was able to apply his love of foreign language, but “he wanted something more creative.” In working with a psycholinguist professor as an undergraduate, Aleksey was inspired to study mechanisms associated with speech perception and investigate “mechanisms at work when people hear and understand spoken language.”

Aleksey Novikov

After gaining his undergraduate degree, Aleksey sought several different career experiences as he jumped between working at an IT company and an evening teaching job because, as a true academic, Dr. Novikov knew, “I need intellectual fulfillment.” He began teaching English to foreign language students who had day jobs and could not take regular classes, and it was through Aleksey’s time late-night teaching he learned about the Fulbright Scholarship Program

The Fulbright Program awards grants to support English Teaching Assistant Programs. Fulbrighters are English teachers who may teach students in their native languages, and Aleksey taught Russian in a self-directed language program at California State University in 2009 on a Fulbright Scholarship. Throughout his Fulbright Program, he took two classes as a non-degree student: Foundations of TESOL and American Studies, which he was able to transfer to graduate credit four years later at the University of Arizona.

Dr. Novikov earned both his MA in Russian and Slavic Studies and his PhD in Second Language Acquisition and Teaching at the University of Arizona, while teaching Russian and linguistics and working as a research assistant for Crow, too. He was ahead of the game by designing a mostly asynchronous Russian language course with synchronous conversation components pre-pandemic. In May 2021, Dr. Novikov earned his PhD, defending his dissertation: Syntactic and Morphological Complexity Measures As Markers of L2 Development in Russian.

Here at Crow, Aleksey has contributed his diverse background experience to a number of projects including Python and corpus building workshops, recruitment and organization of student work for the repository, and as a Primary Investigator on the multilingual academic corpus of assignments (MACAWS). Most recently, Dr. Novikov piloted Crow’s program CIABATTA in Russian and built a significant amount of the CIABATTA infrastructure.

Aleksey is incredibly passionate about educating students in linguistic studies and mentions, “I appreciate teaching and introducing people to linguistics. That is just the best moment of my life.” As of 2022, Dr. Novikov is a Visiting Assistant Professor at the Oxford College of Emory University, where he continues to teach several linguistics courses as well as Quantitative Theory and Methods, an introduction to data and statistics.

As a Crowbird, Dr. Novikov is looking forward to planning a CIABATTA workshop in the Fall 2022 and continuing to work as a PI on MACAWS, including updating the IRB. Needless to say, Aleksey likes to stay busy and intellectually curious, and Crow is excited to celebrate his work and wish him all the best in his upcoming projects!

The Crow team is excited to continue to grow with the addition of new interns across three different institutions. Vivek Natarajan is an Undergraduate Research Assistant at Purdue University, Anuj Gupta is a Graduate Intern at the University of Arizona, and Faisa Aden is an Undergraduate Research Assistant at North Carolina State University. All three interns engage in a diverse array of degrees from English to Computer Science, and Crow looks forward to their involvement across all of our projects. Below, each new researcher shares their scholarly backgrounds and goals for their work as new Crowbirds.

Vivek Natarajan is a junior at Purdue University studying Computer Science with minors in Math and Linguistics. His current interests include Natural Language Processing and Machine Learning. In addition to English, he also speaks Spanish, Hindi, Tamil and some Sanskrit and Telugu, due to living in India for six years. Here at Crow, Vivek will be working on de-identification, the development of the corpus and repository, and improving the tools we use for that important work.

Anuj Gupta is a graduate student at the University of Arizona in the Rhetoric, Composition & the Teaching of English program. He works at the intersection of composition studies, applied linguistics and digital humanities. Using computational & corpus approaches, he wishes to explore how media texts use emotions to rhetorically persuade the public. Through Crow, he is excited to learn text mining techniques and build resources that will enable students to experiment with these methods for their research. He is also looking forward to getting involved with publications and presentations about Constructive Distributed Work.

Faisa Aden

Faisa Aden is a Sophomore at NC State, studying Psychology and English with a concentration in Linguistics. In the future, she hopes to study abroad and either teach English as a Second Language, or work in the publishing industry as a technical writer or journalist. On campus, she is involved with Social Innovations Fellows, McNair Scholars, and University Scholars.

Outside of the Crow lab, all three of our accomplished interns explore unique hobbies. Vivek is an avid blouderer and likes trying to find the hardest possible way to climb up steep rock races outside. Anuj plays guitar, reads sci-fi novels, and passes his time thinking about stuff that happens in outer space. Faisa enjoys reading, watching Turkish IV shows, and spending time with her family. Congratulations to all three Crowbirds. Welcome to our team!

The Crow team is glad to be presenting at the NC State University Equity Research Symposium on Tuesday, February 8. Crow researcher Dr. Michelle McMullin, assistant professor of English at NC State, will be joined by Crow co-PIs Dr. Hadi Banat (U of Massachusetts, Boston) and Dr. Aleksandra Swatek (Adam Mickiewicz University, Poznań, Poland).

We’ll be presenting “Constructive Distributed Work: How Crow builds ethics, equity and access in research teams,” which introduces our team and our corpus and repository platform, describes the “Constructive Distributed Work” heuristic we use to guide mentoring and professional development on our team, and shows how user experience design helps us design more sustainable, ethically robust tools for teaching and research.

If you’re in Raleigh, our talk will be in Talley 4280 from 3:00 to 4:00pm Eastern time on Friday, February 8. NCSU will be live-streaming as well. (Find this time in your time zone.)

Emily Palese has been a leader of the Crow repository since 2019 and earned her PhD in Second Language Acquisition and Teaching from the University of Arizona in May of 2021. Emily previously studied Spanish and Anthropology at the University of Wisconsin-Madison as an undergraduate and earned her MA in Teaching English as a Second Language from the University of Arizona. As a member of the Peace Corps for two years, Dr. Palese taught English in the Philippines at a rural high school, and she also facilitated training workshops for elementary and high school teachers from other parts of the country. Here at Crow, Emily has contributed to a diverse array of projects. Collaborative work has been important to Emily for a long time, and she gains team experience from her work on the Crow repository, AZTESOL, the Second Language Writing collaboratives, and WriPACA.

Emily’s dissertation, Prompting students to write: Designing and using second language writing assignment prompts, for her recent PhD investigates how assignment prompts function in first-year writing courses at the University of Arizona. Dr. Palese’s motivation for this project came from her own experience, “When I was new to the university, I struggled to understand as a young professional, what are the expectations?” Emily immediately realized, “having a framework for analyzing prompts and [being able to] compare what you’re doing is really helpful when you’re designing new materials,” and she began her research of prompt interaction by collecting and reviewing prompts for Crow’s repository.

When Dr. Palese brought her research to her 18 student participants, she studied “how students are interacting with the materials, what they’re skipping when they’re reading, [and] what they think is important.” Emily conducted “think aloud” interviews and described her process, “As the students interacted with the prompts for the first time, I screen recorded with audio to see how they navigated [the prompts and] what their thoughts and reactions were as they looked at them. Immediately after, I had a semi-structured interview with each of the students to follow up on what they valued and how they used the prompts.” Additionally, Dr. Palese studied the rhetorical moves that occur in assignment prompts to understand how instructors give directions. Her analysis of writing is complemented by interviews of six instructors and observations of their courses. 

After earning her PhD in 2021, Emily became the Assistant Director of Global Foundations Writing at the University of Arizona, where she “provides instructional support for global micro-campuses, including onboarding and supporting instructors, developing materials, and assessing and adapting curricula.” Here at Crow, Dr. Palese finalized her work on the repository team and began preparing to transition to her new leadership position. Currently, Emily is enjoying exploring her new role and reflects, “I’m happy that the repository has new leadership and members so our original ideas and protocol can get refined with new perspectives.” 

We wish Dr. Palese well with all of her 2022 endeavors!

Crow receives significant interest from students and faculty in building their own corpora. Many people interested in corpus building are unsure where to start. How can data be organized effectively? How can participants be contacted and treated ethically? The Crow team hopes to  answer these questions by providing a Corpus in a Box: Automated Tools, Tutorials and Advising, or CIABATTA. The December 6th, 2021 CIABATTA launch introduced the “Corpus in a Box” to an international audience with participants from Lebanon, Colombia, Hong Kong, Italy, Greece, Saudi Arabia, United Kingdom, Canada, Ghana, Brazil, United States, and Poland.

The launch event began with Dr. Shelley Staples describing the content included in CIABATTA and the motivation behind the development of the corpus building process. While we created CIABATTA to help scholars begin their own corpora, Staples pointed out it is important to recognize that if you need to build a corpus, “It’s a lot of work!” If you decide building a corpus would be helpful to your research, CIABATTA has put together a start up process for anyone looking to build their own corpus. 

Building CIABATTA has allowed the Crow team to pool our experiences and contribute programming, using automated tools, and user experience guidelines. However, coding experience and research experience is not necessary to use CIABATTA. As Staples described it, CIABATTA is designed for students and faculty around the world: “from novice users looking to begin conducting data analysis through their corpus to experienced programmers ready to streamline their own processes for corpus building,” as the CIABATTA web page notes. 

CIABATTA includes several main goals: 

  1. best practices for corpus building
  2. ethical issues in corpus building
  3. checking consents and collecting data
  4. organizing your data
  5. converting, encoding, and standardizing your data
  6. organizing, preparing & processing metadata
  7. adding headers and changing filenames
  8. deidentifying your data

Attendees of the launch presented a variety of motivations for using CIABATTA, with several participants asking about using CIABATTA in academic courses and piloting CIABATTA in different languages. We encourage these uses and supported these goals in the Q&A section of the launch:

Screenshot of CIABATTA launch, showing "CIABATTA content" with list of the nine sections of content: (1) best practices for corpus building; (2) CIABATTA overview; (3) ethical issues in corpus building; (4) checking consents and collecting data; (5) organizing your data; (6) converting, encoding, and standardizing your data; (7) organizing, preparing & processing metadata; (8) adding headers and changing filenames; and (9) deidentifying your data.
The nine sections of CIABATTA content (also in the list above)

In building CIABATTA, we chose GitHub as the presentation platform because of its ability to integrate code and text from the GitHub wiki. Through GitHub, users are directly linked to the most recent data code and automated tools. In response to one participant’s question, “Could you convert CIABATTA into a textbook?” Staples and Dr. Adriana Picoral encouraged using CIABATTA or other Crow information to share with a class. 

One attendee asked if CIABATTA could help build corpora in languages other than English. The answer is yes! The Crow team has successfully piloted the Corpus in a Box in Portuguese and Russian through the Multilingual Academic Corpus of Assignments: Writing & Speech (MACAWS), and encouraged attendees interested in piloting other languages to work with Crow to offer feedback

Another important question in the Q&A section asked about CIABATTA as opposed to other programs, such as Lancsbox. Crow Team member Dr. Aleksandra Swatek answered the comparison by noting, “Lancsbox is more to analyze the corpus … CIABATTA helps to compile the corpus and all the other steps you need to prepare your files.”

In the CIABATTA Open House on December 7, 2021, ACLS program officer Dr. John Paul Christy asked about ethical concerns in corpus building, pointing out that the public turn in ACLS work has highlighted issues about the co-creation of knowledge. We shared some experiences across Crow. Dr. Bradley Dilger described his decision to defer recruiting corpus participants while he was an administrator at Purdue. Dr. Staples described our original plans for building the repository, which included posting identified materials as a way to recognize and potentially reward instructors for their participation. However, we realized doing so could result in identification of students through triangulation. This is one reason we sponsored the Crow Writing Contest at Arizona — to recognize our students’ good work without identifying their contributions to the corpus. 

Our next steps for CIABATTA include user experience testing with targeted groups such as the Crow Fellows, users of Crow, and developers and researchers using the Crow code on GitHub. If you use CIABATTA, we’d love to hear from you! Join our mailing list to stay up to date and offer your feedback, if you wish.

If you are interested in CIABATTA and were unable to attend the Launch or Open House, additional CIABATTA information can be found on CIABATTA’s GitHub and the Crow YouTube channel. We welcome your questions about CIABATTA. Just send us a note to

The fall leaves are almost done changing colors and the Crow team is hard at work! This semester, Crow was awarded the Covid-19 Research Disruption Grant. This funding was provided by the Offices of the Executive Vice President for Research and Partnerships (EVPRP) and the Provost in light of the Covid-19 pandemic’s impacts on the Crow lab. We are grateful to receive this funding, which will go a long way towards supporting our project outcomes. To get back to work and back on track, the Crow lab is hitting the ground running and welcoming three new undergraduate researchers!

Crow’s undergraduate team: from left, Professor Bradley Dilger, Ryan Day, Anna Shura, Abby Elkin, and Hannah Brostrom

Hannah Brostrom is a sophomore at Purdue University studying Professional Writing and Computer Science. She is interested in consumer technology and user experience research, and hopes to be a technology journalist in the future. She is involved with several creative writing publications, and in her spare time likes to play guitar and cook with her roommates.

Abby Elkin is a senior at Purdue University studying Professional Writing with a minor in Women’s, Gender and Sexuality Studies, and she has an Associate of Science degree from Ivy Tech. In the future she would love to write books (and would get assistance from Translation services Adelaide to reach to people with her writings) while traveling. In her spare time, she enjoys playing with her dogs, playing video games, and crocheting.

Anna Shura is a sophomore at Purdue University studying Professional Writing and Creative Writing with a minor in Global Liberal Arts Studies. In the future, she hopes to study abroad in the United Kingdom and work in the publishing industry or creative marketing (visit this site that tells all about the best marketing services that are being affordable by all kinds of business). On campus, she is involved in several English organizations including serving as the President of the Professional Writing Association. In her free time, she enjoys playing violin in Purdue’s Symphony Orchestra, cooking and baking, and crafting.

My name is Ryan Day, and I am a senior in Civil Engineering and Political Science with a minor in Spanish. I have served in cross disciplinary research groups while at Purdue, including in the Building Water Systems group and Transculturation group. I am also currently a writing tutor with the Purdue Writing Lab and President of Purdue Science Olympiad. In the future, I plan to attend law school and continue combining my humanities and hard science experiences in a legal profession. I spend my spare time skiing, scuba diving, and traveling with friends and family.

As a three year veteran of the Crow team, I will serve as the mentoring undergraduate researcher, supervising and training the new team members. This is a big step forward for me as a member of the team, and I’m excited to take on this new challenge. I hope to translate my previous extracurricular and co-curricular leadership experience into a new research setting. Having been through the onboarding process before, 

Between myself and the new undergraduate researchers, we aim to undertake a number of the original project’s outcomes:

  • First, we intend to recruit instructor participants from high schools and community colleges by increasing outreach activities and extending the Crow platform. As undergraduate members of the team, we have a unique and particularly valuable role to play in outreach by increasing Crow’s visibility. 
  • We also intend to gather user experience data and direct feedback from Crow platform users. By analyzing this feedback, we will be able to better shape user interfaces, documentation, and supporting content. 
  • As our team continues to grow, we also plan to hold team meetings across all Crow institutions, to shape project direction, especially outreach and community engagement. Our role will be helping to organize and facilitate these “Crow Summits.”

We are also well underway with updating the Crow web site to document our work and attract a larger community of Crow users who use the platform for both research and professional development. Articles like this one are a big part of advertising and detailing the activities of the Crow team and developing this community relationship. Soon, we’ll start publishing some work from Hannah, Abby, and Anna, too.

We are grateful to EVPRP for their support of the Crow team and look forward to sharing a summary of our work with them in April 2022.

The Crow team is thrilled to announce the release of Corpus in a Box: Automated Tools, Tutorials, & Advising (CIABATTA). We invite you to attend our launch event and open house!

At these events, will introduce our CIABATTA toolkit for building corpora, which has been developed collaboratively by Crow researchers across our network of partner institutions. We will briefly introduce how and why we built this toolkit, and then take questions from the audience on their specific questions related to corpus building.

All are welcome, both individuals who can program using Python, and others who have little or no programming knowledge but are interested in building and working with corpora.

We invite you to join us at our release event and/or open house. Please note that registration is required. Use the links below to register. If you have any problems, please contact us.

Launch: December 6, 2021, 10am to 11am (Arizona Time/MST)

Find the launch time in your time zone: when-is-ciabatta-launch
Register to attend the launch

Open House: December 7, 2021, 8am to 9am (Arizona Time/MST)

Find the open house in your time zone: when-is-ciabatta-open-house
Register to attend the open house

The development of CIABATTA has been supported by an ACLS Digital Extension Grant from the American Council of Learned Societies (ACLS). We are grateful to ACLS, Humanities Without Walls, and our other funders for their support.

We hope you can attend. Questions? Contact

Downloadable flyer for event

On October 23rd, 2021, we were excited to host an online workshop at the Arizona Teachers of English to Speakers of Other Languages (AZTESOL) 2021 conference. The goal of our workshop, “Exploring tense-agreement issues in L2 writing using a learner corpus,” was to introduce the Crow platform and show how to use concordance lines to help students identify and understand tense-agreement patterns. Our team consisted of Ph.D. student Anh Dang, Ph.D. student Hui Wang, and Ph.D. candidate Ali Yaylali. 

Screenshot of slide containing the text: "Exploring tense-agreement issues in L2 writing using a learner corpus. Crow, the Corpus & Repository of Writing. AZTESOL State Conference 2021. October 22–23, 2021."
First slide from AZTESOL 2021 workshop 

What we shared at AZTESOL 2021

During the workshop, attendees were introduced to Crow learner corpora and Data-Driven Learning (DDL) by reviewing authentic sentence samples and grammatical forms from students’ texts. After the introduction, we guided attendees through an interactive corpus-based activity that contained three parts: 

1) Noticing verb tenses in learner writing.

In this section, participants read a list of sentences selected from Crow corpus, and identified the tense-agreement patterns by answering the guiding questions.

2) Searching the concordance lines. 

In this activity, participants looked at some concordance lines from Crow corpus, and answered the questions regarding the patterns and different tenses.

3) Independent practice.

We provided two options in the last part. Participants can either revise tense-agreement issues in an excerpt from Crow corpus or revise the issues in their own paper. They can make a decision based on their own teaching context.

During the activity, attendees were invited to use the embedded scrollable concordance lines to observe keywords and tense pattern variations. We then guided workshop participants to try the independent practice: finding and revising the tense agreement issues in the authentic excerpt. 

Screen shot of concordance lines showing a query for "what," with about 20 lines of text showing that key word in context.
Example of concordance lines used for the corpus-based activity

After sharing the activity demo, we provided some questions for the participants to discuss how they can adapt and implement this activity to fit their own instructional context and student needs. Some of the participants mentioned they needed to have more scaffolding activities in K-12 context. We were excited to hear their feedback on the activity design and valuable ideas on the activity application.


After our workshop, participants were invited  to:

  1. Download and print out our activity handout to implement this activity in their future teaching;
  2. Use the Crow platform to explore the linguistic features of students writing;
  3. Develop effective activities based on available data and information from our platform;
  4. Guide students to raise awareness and accuracy by using authentic language samples. 

Workshop materials  

We’ve included the materials we presented here. 

Thank you for your interest! We also thank all participants and organizers for their support. We look forward to attending AZTESOL next year.

This week Dr. Michelle McMullin and I were invited to speak in Dr. Beth Towle’s professional writing class at Salisbury University. One of the good things about more video-conferencing: easy to be a guest speaker in a class!

Our talk shared the Crow model for grant writing, described how we use it for professional development, and proposed three practices for lo-fi team building. Here’s the slide deck we shared.

This talk was based in part on the materials we shared with the Arizona Women’s Hackathon, especially the “consecutive agenda” Crow uses for agenda-setting and note-taking in meetings.

This post will be updated.