Corpus and Repository of Writing

Building on the success of her previous Crow workshops engaging the R environment, Dr. Adriana Picoral will host “Quantitative Language Data Analysis and Visualization in R” on April 17, 2021, from 9:00am to 11:00am (Arizona/US/MST). (See this in your time zone.)

In this workshop, we will work with the Variable that data from Tagliamonte’s book “Variationist Sociolinguistics” (2012). We will visualize frequency of that complementizer omission across speaker groups, and run both linear and logistic regression. Concepts such as correlation, interaction, and contrasts will be addressed.


  1. Please register for the workshop.
  2. Download and install the latest R version from
  3. Download and install the latest RStudio from 

If you are unable to attend, watch for a video on the Crow YouTube channel the week following the workshop.

Questions? Please contact Dr. Picoral.

We’re happy to share some recent good news from across the Crow team.

Nina Conrad

Congratulations to Crow researcher and University of Arizona doctoral candidate Nina Conrad, who was awarded a Bilinski Fellowship for her dissertation project, “Literacy brokering among students in higher education.” The fellowship will fund three semesters of writing and includes professional development opportunities as well. 

Crow researcher Hannah Gill was admitted to the Mandel School of Applied Social Sciences at Case Western Reserve University, including a scholarship and funding to support her field work. In May, Hannah will graduate from the University of Arizona, with a double major in English and Philosophy, Politics, Law, and Economics (PPLW).

Thank you to everyone who attended our third Crow Workshop Series event, focusing on grant writing. If you were not able to attend, please see the video on our YouTube channel. Our slides and handout are also available. 

We were so pleased by the turnout. Our workshop team (Dr. Adriana Picoral, Dr. Aleksandra Swatek, Dr. Ashley Velázquez, and Dr. Hadi Banat) is reviewing the feedback we got and planning our next event. Stay tuned! 

Dr. Adriana Picoral

Dr. Picoral was awarded a mini-grant for a series of professional development workshops designed to increase the gender inclusivity of the data science programs at the University of Arizona. The workshops will be hosted by Dr. Picoral in cooperation with two invited speakers. 

Ali Yaylali, Aleksey Novikov, and Dr. Banat wrote about Crow’s approach to data driven learning (DDL) in “Using corpus-based materials to teach English in K-12 settings,” published in TESOL’s SLW News for March 2021. This is our second piece for SLW News, following “Applying learner corpus data in second language writing courses,” written by Dr. Velázquez, Nina Conrad, Dr. Shelley Staples, and Kevin Sanchez in October 2020. 

Finally, Dr. Picoral, Dr. Staples, and Dr. Randi Reppen published “Automated annotation of learner English: An evaluation of software tools” in the March 2021 International Journal of Learner Corpus Research. Here’s the abstract:

This paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.

Picoral, A., Staples, S., & Reppen, R. (2021). Automated annotation of learner English: An evaluation of software tools. International Journal of Learner Corpus Research, 7(1), 17–52.

We thank all of the Crow researchers and Crow friends who supported this good work, and the editorial teams, reviewers, and funders who made it possible. 

The Crow leadership team would like to express its condemnation of anti-Asian and gender-based violence and to communicate its support for Asian Americans, Asians, and Pacific Islanders. We are saddened and angered by the murders of Delaina Ashley Yaun, Paul Andre Michels, Xiaojie Tan, Daoyou Feng, Hyun Jung Grant, Soon Chung Park, Suncha Kim, and Yong Ae Yue. We condemn the increased violence this past year against Asian and AAPI students, faculty, and individuals at our own institutions, both in the U.S. and abroad.

Our team closely collaborates with our Asian and AAPI team members, and greatly values the contributions of Asian and AAPI students and teachers as participants in our research. We want to express our solidarity with those individuals and others in our own professional networks, and invite others to do the same for their students and colleagues.

Please visit Stop AAPI Hate for more information and resources.

Ashley Velázquez
Shelley Staples
Michelle McMullin
Adriana Picoral
Bradley Dilger
Randi Reppen

The Crow workshop series continues! 

Workshop flyer. PDF version linked, text in main post.
Before you start “writing” flyer. PDF version available.

Fellowship and Grant Writing for PhD Students & Early Career Scholars, Part I: Before You Start “Writing”
In this workshop, we will discuss why you should apply to grants and fellowships (and the difference between these). We will also address how to find grants and fellowships, as well as how to prepare for applying. Designed for early career scholars from around the world who conduct writing research, broadly construed. 

Saturday, March 13, 2021, 9:00 to 10:30 am Pacific/USA 
(UTC: Sat Mar 13, 17:00 to 18:30)

Presenters are Aleksandra Swatek, PhD; Hadi Riad Banat, PhD; Adriana Picoral, PhD; and Ashley J Velázquez, PhD.

Please register for the workshop and share any questions you have beforehand. We hope to see you there! 

The Crow team is excited to be a part of the University of Arizona’s Women’s Hackathon for 2021. We’ll be offering a workshop, “Collaborating online: Lessons from a Successful Team,” on Saturday, March 6, at 1:00pm Mountain time. Michelle McMullin, Shelton Weech, and Bradley Dilger will be facilitating.

Collaborating online: Lessons from a Successful Team
Based on the experiences of an interdisciplinary software design and research team working at multiple sites, we share three principles for collaborative teams who prioritize inclusivity and mutual respect. Examples and practical techniques will help your team work together more effectively both asynchronously and when working together in person.

We offer three best practices you can adapt to your team:

  1. Build visible infrastructure: For online teams, digital infrastructure is the documents and communication that facilitate work. We share the consecutive agenda, our approach to keeping notes and agendas for meetings, and principles for using a team communication platform like Slack, Basecamp, or Microsoft Teams.
  2. Practice active listening: Krista Ratcliffe describes active listening as actively seeking to hear what is different about the ideas of other people. We offer several concrete approaches for listening actively to others on your team.
  3. Coordinate work purposefully: Distributed teamwork requires connecting people, tools, and documents that are separated geographically, sometimes in different time zones. Scholars call this work coordination. We describe ways to coordinate work across documents and digital infrastructure.

We’ve created a template for the consecutive agenda Crow teams use to combine meeting agendas, notes, and links to our team communication platform. Examples of other techniques appear in the video presentation.

Our materials:
A video of our presentation, for those unable to attend synchronously.

The slide deck for our presentation is also available.

Our second Crow workshop will be held on December 19, 2020 from 9:00 to 11:00am (Arizona time/MST).

“Corpus Searches in R: Regular Expressions and Concordance Lines” will be hosted by Adriana Picoral, PhD, assistant professor of data science at the University of Arizona. 

Workshop flyer. PDF also available.

Corpus Searches in R: Regular Expressions and Concordance Lines
Saturday, December 19, 9am to 11am (Arizona time/MST).

In this workshop, we will work with a tagged corpus. We will go over the steps of reading in a corpus (organized as multiple text files) in R, doing searches in the corpus using regular expressions, and producing concordance lines. We welcome to this workshop corpus linguists that are not yet familiar with R but interested in expanding their coding skills.

Register through Zoom. For more information, please contact Dr. Picoral.

Did you miss our first workshop? Watch a video on our YouTube channel.

Workshops in Spring 2021 will be announced soon. Got a workshop suggestion? Let us know!

In May of 2020, Crow members Ashley Velázquez, Hadi Banat, and Shelley Staples hosted a workshop with Metropolitan State University of Denver (MSU Denver) faculty and students. Originally, our workshop was intended to be held during TESOL’s International Conference in Denver, Colorado. Unfortunately, due to COVID-19, we were not able to attend TESOL this year, but we were able to continue our Outreach efforts by advertising our workshop with interested parties. To our delight, several folks at MSU Denver were excited to participate in a virtual workshop with us to learn more about Crow’s online corpus and how our corpus can be used for innovative teaching and research, teacher-training, and its usefulness for Writing Centers.

Slide from our presentation, reading "Using the Corpus and Repository of Writing for Teaching and Research." Two images: concordance lines showing a query for "research," and a cartoon of people of diverse ages, genders, and races saying "Hello" in multiple languages.
“Using Crow for Teaching and Research,” our slide deck

In alignment with our goals for our ACLS Digital Extension Grant, outreach efforts this year have primarily focused on expanding our corpus to include representation of multilingual writers to a new population of heritage Spanish writers at the University of Arizona while also reaching out to other institutions that serve this population of students. MSU Denver is a newly designated HSI, or Hispanic Serving Institution, so it was fitting that we were able to introduce Crow to this particular audience.

Our workshop with MSU focused on both teaching and research. Unlike past workshops, we focused on building an explicit relationship between teaching and research that was accessible to those who have little to zero experience with corpus linguistics. Additionally, unlike other workshops, our audience members, except for one, were all teachers in training and writing center tutors in training, enrolled in the RIDES program. Finally,  we were invited to conduct this workshop as part of a mentoring course for the RIDES program. Until now, the majority of our workshops have been held at, or alongside, conferences (excluding our workshops at Wright State University and Universidad de Sonora). 

We introduced our online corpus by starting with a few simple searches and introducing participants to the various filtering options and asked participants to examine the different information available during these searches while also demonstrating the connection between our corpus and our repository. After demonstrating a few searches, we asked our audience to think of how we might use such searches (e.g., transitions and synonyms) for developing classroom and tutoring-specific activities. For example, for synonyms, we may want to help students develop their vocabulary by noticing nuanced differences between near synonyms like important and significant. Teachers can help students discover and notice these differences by providing authentic examples of these synonyms in use and guiding them with questions and corpus-based activities. 

Finally, we introduced the audience to the repository interface features and the metadata pertaining to the pedagogical materials we are collecting. For example, workshop participants explored the repository searchability tool and filters to look for specific pedagogical materials pertaining to certain assignment genres of interest. By going through metadata filters such as institution, year, semester, course type, modality and length, they got a better sense of the variety in pedagogical materials across Crow sites. We then demonstrated some searches with the repository, focusing specifically how assignment handouts, syllabi, rubrics, and classroom materials may be used during a tutoring session in the writing center and for the purposes of tutor-training. 

What did we learn from hosting the workshop?

The writing center tutors in training at MSU Denver will be part of the RIDES program, a writing center intervention that supports culturally and linguistically diverse students with practical language skill instruction, sometimes not prioritized in a writing center consultation. The audience of tutors were not familiar with corpus driven methods as pedagogical interventions. The time we spent introducing data-driven learning (DDL) pedagogical activities helped them consider nontraditional activities they can use in writing center consultations. One such activity is our “Transition words” activity. This activity introduces students to a variety of transition words and walks them through the process of noticing the types of transition words used, where they’re located in sentences, and the structures used with each transition word. 

Our main takeaways as Crow researchers and teachers keen on sustaining outreach to diverse audiences at Hispanic Serving Institutions are the following:

  • Novice Corpus Users: Continue expanding our reach to audiences who do not have prior experience with corpus linguistics and make corpus-based pedagogical approaches accessible and approachable to nontraditional audiences like writing center tutors, teachers in training, and under-represented minorities.
  • Scaffolded Workshops: Develop a series of workshops, specific to the needs of  novices in corpus linguistics, that scaffolds corpus-based teaching and research. This  may be a beneficial step towards unpacking threshold concepts and making corpus linguistic methods less intimidating.
  • Undergraduate Audiences: Strategically reach out to undergraduate students and make our workshops accessible to this population. This is especially relevant since Crow has experience with working with undergraduates on our research team.
  • Teachers in Training: Explore opportunities to work with teachers in training who may not have sufficient TESOL or TESL background and training. Sometimes lack of training is due to lack of resources, and this realization further helps us address the ACLS funded outreach goals for Crow. 
  • Writing Center Directors: Build relationships with writing center directors who are keen on introducing new pedagogical interventions for writing center consultations and in tutor training programs. Writing center tutorials usually focus on tutee writing, so shifting the paradigm towards mentor texts could be a beneficial intervention with tutees who need more language instruction support. This paradigm shift honors descriptive vs. prescriptive approaches and defies the deficit model in tutoring multilingual writers. 

We thank Rachel Hawley for inviting us and helping us attract an audience. We look forward to applying what we’ve learned to our next workshops. 

We are pleased to share that Crow researchers will be hosting a series of workshops targeted at teacher-scholars who, like us, value inclusive approaches to studying and teaching writing. These free hands-on workshops will be held on Zoom, making them accessible to people across the globe.

Our first workshop, “Corpus Data Scraping and Sentiment Analysis,” will be hosted by Adriana Picoral, PhD, assistant professor of data science at the University of Arizona. 

Flyer for Crow workshop Nov 10, 2020
Workshop flyer. PDF also available.

Corpus Data Scraping and Sentiment Analysis
Saturday, November 7, 10am to 12pm (Arizona Time/MST)

In this workshop, we will scrape Amazon for reviews using the rvest R package to build a corpus of product reviews. We will then do some sentiment analysis from a critical perspective. We welcome to this workshop corpus linguists that are not yet familiar with R but interested in expanding their coding skills.

Register through Zoom. For more information, please contact Dr. Picoral.

Future workshops will include other subject matter including grant writing, developing distributed teams, applying for dissertation fellowships, building learner corpora, and more. Got a workshop suggestion? Let us know!

This is the first in a series of posts where Dr. Swatek will share the work she’s doing with the Scholarly Communication Research Group in Poznań, Poland.

Grant writing is a process that is notoriously difficult: even if you have a team of the best writers working on your grant, the chances of winning are slim. As a member of the Crow team, I have witnessed or participated in a few grant writing processes. This proved to be a very useful experience as I neared the completion of my academic studies at Purdue.

Dr. Aleksandra Swatek, on interview day at the National Science Center

In January 2019,  I was getting ready for graduating from the Second Language Studies program that was my home for five years. It was time to face the inevitable transition to Poland, a plan that I had from the very first year of my PhD. As a Fulbright grantee, I knew that my mission was to come back to my country one day, to share what I learned. 

Throughout my graduate education, I always closely monitored the Polish higher education news, academic job market, and development of grant schemes that might provide me with employment. In May 2019, with the end of my program in sight, as most of my cohort was going through the gruelling process of academic job market searches in the USA, I was trying to plan for the unknown politics and process of the Polish academia. The most viable path was securing my own funding for academic work. This blog post, which will be the first of a series, describes my process of selecting and applying for the Poland National Science Center Sonatina grant, which is funding my current project. 

Finding Opportunities 

There are two main funding agencies in Poland for supporting academic research: the National Science Center (NSC) and The Polish National Exchange Agency (PNEA). They offer grant opportunities for early career scholars, with some programs targeting scholars educated abroad. NSC is a well-established institution, founded in 2011, with grant schemes with multiple editions and rules that remain steady from year to year. The PNEA is a new agency, whose grant schemes are constantly being improved and altered to better serve the mission of internationalizing and promoting Polish science abroad. This poses a challenge for anyone who wants to apply, and proved to be an obstacle for me—as I chose one of the grant programs, the rules changed in the edition I was planning to apply, completely altering my plans. 


Quite early in the process of analyzing possible opportunities from both agencies, I realized I needed a host institution willing to partner with me. My MA studies in Poland were focused on philology, meaning the small academic network I developed in the past included scholars working in completely different areas. To find the right fit for my current research—in terms of program, people, and environment—I had to reach out to scholars I have not worked with previously.

None of my very limited connections closely followed research on Polish scholars and academic writing in English. This led me to the blog Warsztat Badacza (Researcher’s Craft) written by Dr. Emanuel Kulczycki. The blog often featured summaries of research articles related to academic research evaluation and productivity of scholars from Poland and other European countries. Although his work was not exactly focused on writing research, it helped me understand the structural issues related to academic publishing in the region. I reached out to Dr. Kulczycki in the summer before my graduation, while I was in Poland visiting family in June 2019, and met up with him for coffee. As we talked about our own careers and interests related to academic writing and publishing, it became clear that I would fit well into the Scholarly Communication Research Group at Adam Mickiewicz University in Poznań, Poland. 

Initially, I wanted to apply for a grant for returning Polish scholars, but as the program opened in February 2019, the rules changed and I was no longer eligible to apply. Dr. Kulczycki suggested instead the Sonatina grant from the National Science Center. It was February, and the deadline was March 15th. At the same time, I was working on finishing my dissertation. It was a very tight timeline for conceptualizing, drafting, revising, and submitting the proposal. 

Grant Writing & Feedback

The grant proposal delineated a project that aligned with my dream research agenda: to examine second-language (L2) writing practices in Poland. Within the larger research agenda, I decided on the most viable and interesting project: the writing practices of early-career scholars in four academic disciplines in social sciences and humanities. While there is a sustained research inquiry into practices of early-career scholars in the United States or China, there has been no research done in the Polish context. Using the knowledge and skills gained in the graduate programs at UMaine and Purdue, I designed a mixed-methods study that will allow me to examine the motivations and skills of early career scholars in terms of their academic writing in English.  

In the process of drafting the proposal, I relied on the feedback from Dr. Kulczycki, Dr. Aleksandra Kasztalska (my longtime friend and academic research collaborator, able to read both English and Polish text), Dr. April Ginther (the co-chair of my dissertation and my advisor), and my partner Dr. Robert Ariel (who has a keen eye for academic writing). The final version was also read by Dr. Michelle McMullin, who provided comments from a more rhetoric and composition perspective. With a tight deadline, I knew I was also putting some strain on the circle of people who were giving me feedback.

Sonatina proposals go through two levels of review before they reach the final stage. The first round of reviews was completed by anonymous internal reviewers, who reappeared in the interview stage to ask questions. The second round of reviews came from four anonymous international scholars. The range of depth and scope of the review in that round was disparate, with one of the reviewers providing very short, negative feedback, and another one providing thoughtful, enthusiastic feedback. In the whole process, I was aware how unlikely it was that my project would be reviewed by anyone who studies writing from an applied linguistics perspective, especially one situated in the North American tradition. Reaching a non-expert audience was always on my mind, but seeing how researchers from other fields read and commented on the project was enlightening. 

Grant Interview 

The National Science Center requires pitch-style interviews as the final stage of the process. The most difficult part of this was not knowing who would be part of the interview panel, specifically what disciplines would be represented. This goes back to the most important information for any communication event: knowing your audience. Polish academia is still somewhat foreign to me. Despite growing up in Poland and spending time getting my first MA degree in the country, my familiarity with academics from social sciences and the humanities is scant, especially in a non-teaching context. 

Ahead of time, I decided to present my work in English, my academic first language, and to take questions in Polish. In that process, in a very tangible experience, I felt what it means that academic language is not native to anyone, but rather, it is learned and experienced. My Polish presentations are not as confident and fluent as the ones in English, where I have a linguistic repertoire to talk about research in my field. 

What proved very useful was when Dr. Kulczycki shared with me his own experiences of interviewing for the Sonata grant and his approach towards the interview. As I prepared for the presentation, which summarized my project and also the key critical feedback from reviewers alongside my responses to the critiques, I felt grateful to have access to this information. There is very little to no information about how the whole experience looks like or what are the approaches for the pitch. Having access to materials from grant winners in other editions was invaluable. However, this can only happen within a trusted network, as these materials are closely guarded, constituting what Swales called “occluded genres” (2004). 

When I learned that my grant project was funded, I was overcome with joy. I felt also a deep sense of gratitude that I will have the opportunity to share my knowledge with others, to “give back” to my country in the Fulbright spirit.  

Pandemic beginnings

At the time my grant project started on April 1st (no joke!), the Coronavirus pandemic had taken a full hold on Polish life. Nobody was prepared for that turn of events. So my plan to move to Poznań from Kraków was delayed until August 1st. As I started the grant project which will last three years, I decided to spend the time on dive into the literature around early-career scholars, the geopolitics of academic writing (especially in the lesser explored contexts), and issues of linguistic variation in corpora of academic publications from different disciplines. As I finish this blog post, I am sitting in my office at Międzychodzka Street in Poznań, and my first month in the institution is nearing end. I am looking forward to what this chapter in my academic work will bring. 

Seminar day in the Scholarly Communication Research Group: The readings on institutional isomorphism were also discussed using genre theory. From left: A. Swatek, Z. Taskin, E. Rozkosz, M. Holowiecki, K. Szadkowski, J. Krzeski, F. Krawczyk. Photo: E. Kulczycki.

In summer 2020, a group of Crow researchers attended Teaching and Language Corpora 2020 (TALC 2020). We hosted an online workshop and delivered several individual presentations. (More on the latter in a follow-up.) Here, we’ll offer a summary of the workshop, and share some thoughts about our preparation, too. 

The goal of our workshop, “Designing pedagogical materials using interactive data-driven learning (DDL) with multilingual learner corpora,” was introducing the Crow and MACAWS platforms, and demonstrating how both can facilitate what we call “interactive data-driven learning (iDDL).” We wanted to offer attendees a chance to work hands-on with our tools, ask questions, and get help as needed. So we assembled a big team: Ashley Velázquez, Shelley Staples, and Ola Swatek (for Crow), and Aleksey Novikov, Adriana Picoral, and Bruna Sommer-Farias (for MACAWS).

Slide from TALC presentation showing the Crow team members present and the logos of our sponsoring agencies.
Slide from TALC presentation showing the Crow team members present and the logos of our sponsoring agencies.

Crow and MACAWS both include learner corpora built from student texts at our partner universities. Crow focuses on composition in English, and MACAWS on written and oral assignments from Portuguese and Russian foreign language programs at the University of Arizona. Crow includes not only student texts, but a repository of the pedagogical artifacts that shaped them. We’ll be adding a repository to MACAWS in the future. 

What we shared at TALC

After introducing the Crow and MACAWS platforms, we introduced the concept of iDDL, which is our method for integrating concordance lines in a scrollbar format into online pedagogical materials. That allows snippets of Crow or MACAWS data to be embedded in online platforms like Google Sites or even learning management systems like Brightspace. 

We then asked attendees to pick one of three breakout rooms: English to work with Crow, or Portuguese or Russian for MACAWS. In the breakouts, we shared iDDL examples, then gave participants the opportunity to try activities they might use with their students. Participants then had time to build their own activities with help from Crow and MACAWS researchers. 

We’re quite pleased with the results. Thirty-eight people attended, representing at least 12 countries. Our platforms worked well, and in the breakout rooms, attendees were able to use Crow or MACAWS successfully. We got a lot of great feedback about design decisions we’ve made, and ways to tweak our iDDL implementation to make it more flexible and robust.

Preparing this workshop

Hosting online workshops is challenging! Our preparation was extensive: we met multiple times to develop the materials and plan activities. Even though we had hosted workshops before, and were able to draw on that work, we had to test our ideas carefully to ensure participants would have a smooth experience using Crow or MACAWS to work with student texts. 

With the help of other Crow and MACAWS researchers, we rehearsed the workshop and made revisions to both content and presentation—twice! Both times, we had to rethink some of our expectations for keeping things organized and on track. For example, actions which might seem easy — like asking for help — could actually be a challenge. We had to think about the ways participants would be switching between our examples, the videoconference channel, and Crow or MACAWS platforms.

Screencap from Zoom videoconference for TALC testing
Crow researchers help the TALC team test their approach to populating breakout rooms and assisting attendees during the workshop.

This led us to develop a detailed plan to ensure participants navigate the activities we planned. We assigned team members to different roles:

  • Presenting the content we wanted to share
  • Assigning and moving participants to breakout rooms
  • Facilitating activities in breakout rooms during the create-your-own-activity phase
  • Fielding any requests for help from participants
  • Keeping track of time in the introductory, breakout, and wrap-up sections of the workshop
  • Facilitating conversation throughout

We also carefully scoped activities. What seemed to us, at first, like a very small amount of content actually offered a rich and in-depth experience for our attendees, and we’re using what we learned as we get ready to host other online workshops this fall. 

Again, we thank the organizers for the terrific work they did to host TALC 2020. We hope to attend next year as well — hopefully in person this time!