Corpus and Repository of Writing

The Crow team is thrilled to announce the release of Corpus in a Box: Automated Tools, Tutorials, & Advising (CIABATTA). We invite you to attend our launch event and open house!

At these events, will introduce our CIABATTA toolkit for building corpora, which has been developed collaboratively by Crow researchers across our network of partner institutions. We will briefly introduce how and why we built this toolkit, and then take questions from the audience on their specific questions related to corpus building.

All are welcome, both individuals who can program using Python, and others who have little or no programming knowledge but are interested in building and working with corpora.

We invite you to join us at our release event and/or open house. Please note that registration is required. Use the links below to register. If you have any problems, please contact us.

Launch: December 6, 2021, 10am to 11am (Arizona Time/MST)

Find the launch time in your time zone: when-is-ciabatta-launch
Register to attend the launch

Open House: December 7, 2021, 8am to 9am (Arizona Time/MST)

Find the open house in your time zone: when-is-ciabatta-open-house
Register to attend the open house

The development of CIABATTA has been supported by an ACLS Digital Extension Grant from the American Council of Learned Societies (ACLS). We are grateful to ACLS, Humanities Without Walls, and our other funders for their support.

We hope you can attend. Questions? Contact collaborate@writecrow.org.

Downloadable flyer for event

On October 23rd, 2021, we were excited to host an online workshop at the Arizona Teachers of English to Speakers of Other Languages (AZTESOL) 2021 conference. The goal of our workshop, “Exploring tense-agreement issues in L2 writing using a learner corpus,” was to introduce the Crow platform and show how to use concordance lines to help students identify and understand tense-agreement patterns. Our team consisted of Ph.D. student Anh Dang, Ph.D. student Hui Wang, and Ph.D. candidate Ali Yaylali. 

Screenshot of slide containing the text: "Exploring tense-agreement issues in L2 writing using a learner corpus. Crow, the Corpus & Repository of Writing. AZTESOL State Conference 2021. October 22–23, 2021."
First slide from AZTESOL 2021 workshop 

What we shared at AZTESOL 2021

During the workshop, attendees were introduced to Crow learner corpora and Data-Driven Learning (DDL) by reviewing authentic sentence samples and grammatical forms from students’ texts. After the introduction, we guided attendees through an interactive corpus-based activity that contained three parts: 

1) Noticing verb tenses in learner writing.

In this section, participants read a list of sentences selected from Crow corpus, and identified the tense-agreement patterns by answering the guiding questions.

2) Searching the concordance lines. 

In this activity, participants looked at some concordance lines from Crow corpus, and answered the questions regarding the patterns and different tenses.

3) Independent practice.

We provided two options in the last part. Participants can either revise tense-agreement issues in an excerpt from Crow corpus or revise the issues in their own paper. They can make a decision based on their own teaching context.

During the activity, attendees were invited to use the embedded scrollable concordance lines to observe keywords and tense pattern variations. We then guided workshop participants to try the independent practice: finding and revising the tense agreement issues in the authentic excerpt. 

Screen shot of concordance lines showing a query for "what," with about 20 lines of text showing that key word in context.
Example of concordance lines used for the corpus-based activity

After sharing the activity demo, we provided some questions for the participants to discuss how they can adapt and implement this activity to fit their own instructional context and student needs. Some of the participants mentioned they needed to have more scaffolding activities in K-12 context. We were excited to hear their feedback on the activity design and valuable ideas on the activity application.

Takeaways

After our workshop, participants were invited  to:

  1. Download and print out our activity handout to implement this activity in their future teaching;
  2. Use the Crow platform to explore the linguistic features of students writing;
  3. Develop effective activities based on available data and information from our platform;
  4. Guide students to raise awareness and accuracy by using authentic language samples. 

Workshop materials  

We’ve included the materials we presented here. 

Thank you for your interest! We also thank all participants and organizers for their support. We look forward to attending AZTESOL next year.

This week Dr. Michelle McMullin and I were invited to speak in Dr. Beth Towle’s professional writing class at Salisbury University. One of the good things about more video-conferencing: easy to be a guest speaker in a class!

Our talk shared the Crow model for grant writing, described how we use it for professional development, and proposed three practices for lo-fi team building. Here’s the slide deck we shared.

This talk was based in part on the materials we shared with the Arizona Women’s Hackathon, especially the “consecutive agenda” Crow uses for agenda-setting and note-taking in meetings.

This post will be updated.

Thanks to funding from a Purdue Covid-19 Disruption Grant, we are able to hire four undergraduate researchers in the AY21-22 academic year! We are grateful for Purdue’s support of the Crow project. Funding will also allow us to continue working with our developer Mark Fullmer to improve the Crow platform and develop new trajectories for research.

After a training period, these researchers will complete work that helps our project make up for time lost due to Covid-19 restrictions and complications. Undergraduate research assistants will be mentored by experienced undergraduate and graduate researchers and given clear deliverable guidelines, deadlines, and expectations. They will perform the work specified by the grant:

  1. Process and de-identify recently gathered writing research data; 
  2. Interview users from the Crow community and compile information to guide Crow developers; 
  3. Review the Crow Fellows initiative in collaboration with Crow researchers guiding that project;
  4. Create and update a series of web pages documenting recent Crow work, supporting future grant writing, participant recruitment, and outreach efforts. 

Senior Crow researchers will collaborate with the undergraduate researchers to identify which of these tasks offer the best professional growth, and will seek to identify others that develop new skills and experience, such as grant writing, research design, or data collection. 

Research assistants can expect to average 10 hours of work per week with an hourly pay rate of $11. There is also a possibility for earning credit hours for internship or research related coursework. Six weeks of Fall 2021 work will focus on onboarding and training; fourteen weeks are budgeted in Spring 2022, for a total of $2,200. Pending funding, positions may be extended.

Crow is a distributed team, so quite a bit of the work will be remote. A typical week will include a few hours of regular meetings online or in our shared Heavilon 201 lab, then independent work and short online meetings as needed with other Crow researchers. Work schedules will be developed to accommodate academic and personal obligations.

Required qualifications

  • Experience with and/or interest in technical communication, applied linguistics, and diverse areas of writing studies.
  • Experience with and/or interest conducting user experience research and collaborating with software developers.
  • Experience with and/or interest in using Google Drive, Basecamp, Slack, and similar software to facilitate distributed work.
  • Ability to work both collaboratively and independently.
  • Eligibility to work as an undergraduate student at Purdue University.

To be considered, applicants must send the following materials to Bradley Dilger (dilger@purdue.edu) for screening and an invitation for a face-to-face interview.

  • A cover letter explaining interest in this position and describing relevant experience; 
  • An up-to-date resume; 
  • Two references who can speak to the candidate’s experience and/or potential. 

Candidates are welcome to include a PDF or web-based portfolio including samples of writing from courses, internships, and other contexts, but it is not required. 

Screening begins immediately and will conclude when positions are filled. Prospective applicants are welcome to contact Dilger with questions.

Download a printable version of this position announcement.

Building on our work from SIGDOC 2020, we’re presenting on Crow’s approaches to mentoring at SIGDOC 2021.

“Using iterative persona development to support inclusive research and assessment”
Michelle McMullin, Hadi Banat, Shelton Weech, & Bradley Dilger

As we build writing research tools, Crow researchers have always sought more ethical, sustainable approaches to collaboration. How we work is as important as what we make. In this research paper, we highlight the importance of descriptive methods such that our reflexive processes for assessment are transparent, and stay open for negotiation as we learn more, gather feedback, and apply what we learn. We share an in-depth look at Crow methods for persona development and their role in our ongoing research and assessment of Crow practices.

Here’s our video summary:

We also have a PowerPoint version including audio and speaker notes.

Along with this presentation, we have some resources for teams interested in learning more about CDW methods.

We look forward to your feedback, both for this paper, and for the resources we are building. Share your questions or ideas via this form.

We are thrilled to introduce our first cohort of Crow Fellows

  • Olayemi Awotayo, Graduate Instructor, Virginia Polytechnic Institute (Virginia Tech)
  • Dr. Madelyn Pawlowski, Assistant Professor of English, Northern Michigan University
  • Margaret Poncin Reeves, Senior Lecturer, DePaul University
  • Modupe Yusuf, Doctoral Candidate, Michigan Technological University (Michigan Tech)

Later this summer, we will share more about this outstanding group of teacher-scholars. We are grateful to everyone who applied, and especially to the American Council of Learned Societies (ACLS) for the support that makes our Fellows program possible. 

On behalf of the Crow team, I would like to take this opportunity to congratulate all of our graduates, who accomplished so much in this year that was so challenging. Six Crowbirds earned degrees from the University of Arizona: 

Anh Dang graduated with an MA in Teaching English as a Second Language (TESL) in May 2021. She will continue at Arizona as a PhD Student in Second Language Acquisition & Teaching (SLAT), with an assistantship in the UA Foundations Writing Program

Hannah Gill graduated with a double major in Philosophy, Politics, Economics and Law (PPEL) and English in May 2021. She completed an honors thesis, “English Language Learners within the Classroom: Improving K-12 Policy and Enhancing Curriculum Through Corpus Based Instruction.” In Fall 2021, she will begin a Master of Social Work (MSW) in the Mandel School of Applied Social Sciences at Case Western Reserve University. 

Jhonatan Henao-Muñoz earned an M.A. in Hispanic Linguistics, (Winter, 2020), an M.A. in French Linguistics and Second Language Teaching & Learning (Spring 2021), and a Graduate Certificate in Technology in Second Language Teaching (Spring 2021). He has accepted a position as Instructor of French at the University of Arizona.

Alantis Houpt graduated with a degree in English, and a Teaching English as a Foreign Language (TEFL) certification from the Center for English as a Second Language (CESL), in December 2020.

Dr. Aleksey Novikov earned a PhD in Second Language Acquisition and Teaching in May 2021, defending his dissertation “Syntactic and Morphological Complexity Measures as Markers of L2 Development in Russian” on May 7. 

Dr. Emily Palese also earned a PhD in Second Language Acquisition and Teaching in May 2021, defending her dissertation “Prompting Students to Write: Designing and Using Second Language Writing Assignment Prompts” on May 19.  

Kevin Sanchez graduated with a double major in English and Creative Writing in May 2021. He’s finishing up his TEFL certificate and getting ready to teach English abroad. 

Our best wishes to Anh, Hannah, Jhonatan, Atlantis, Aleksey, Emily, and Kevin! We look forward to seeing your next moves. Next week, we will have more to say about the individual accomplishments of everyone on the Crow team. 

Dr. Aleksey Novikov, Dr. Shelley Staples, and Dr. Emily Palese (left to right) at University of Arizona Commencement, May 2021

The web interface for Crow, the corpus and repository of writing, depends on a complex amalgam of interdependent bits of code built by thousands of people.

Five minute read.

A few years ago Thomas Thwaites decided to make a toaster. From scratch. Armed with the breadth of extant human knowledge (thanks, internet!), after a bit of petroleum refining here, a bit of iron smelting there, Thwaites would assemble some things into a bigger thing that would be capable of toasting bread. Surely a simple task. It wasn’t. Thwaites’ exploration into the institutional knowledge and global dependencies that go into a seemingly trivial kitchen appliance makes visible the complexities of something we take for granted. Thwaites shows the real cost of a $20 toaster.

Most web applications I build can’t make your breakfast, but like your average toaster, they have a deep system of prerequisites and dependencies. And even though I do software development full-time—even though my job is to understand what’s going on behind the scenes—I take for granted how much my software relies on the work of so many others.

Like practically all software today, Crow’s online corpus and repository of writing leverages many other software packages — bundles of code that provide discrete services such as sending and receiving data over the web, reading from and writing to a database, rendering tables and forms and charts. Caching. Authenticating. Validating. In the case of Crow’s software, there are also corpus-specific libraries for normalizing, tokenizing, lemmatizing, indexing, querying, filtering, excerpting, and highlighting.

Those parts of the Crow code were created (and are actively maintained) by other developers. Not me. The software I build just sits on top of it, connecting the dots to make purpose out of possibility. And part of my development time is simply keeping Crow code updated with changes in those dependencies, changes which can take the form of bug fixes, security patches, and new features.

To help me with those updates, I built a tool to visualize my code’s dependencies: the Composer Dependency Tree Generator.

Package Management

A bit of background: most contemporary software uses package management tools to define and retrieve all of the bits of those external software libraries that are needed. In the case of the PHP programming language, that package management tool is Composer. Package managers also help developers identify when those packages have available updates (though testing the updates before applying them is still the work of the developer). Package requirements are stored in a single file which contains all of the building blocks of the application — its DNA. The Composer Dependency Tree Generator takes that DNA file and renders it as a collapsible tree.

As visualized below, the Crow software backend requires 36 packages in order for me to write the “real” code of the application (click image to interact).

Maybe 36 dependencies for a single project seems reasonable, even manageable. But as you probably already guessed, those packages depend on other packages. Zoom in on one small but critical part of the application: user authentication. Crow depends on Simple OAuth to send user credentials through the internet securely and verify that the user is who they say they are. OAuth depends on other package (for instance, league/oauth_server implements the OAuth 2.0 standard for handling access tokens and refresh tokens), which in turn depend on other libraries like defuse/php-encryption to encrypt data using the OpenSSL protocol, which is, itself, a dependency. The fractal nature of dependencies quickly comes into focus (select to enlarge):

Because of these interrelationships, Crow’s DNA file — not the actual code, mind you, just the list of which building blocks the code needs — is 8,700 lines long. Behold, the fully articulated Crow dependency tree (select to interact):

So: a lot of building blocks. A lot of moving parts. A lot of humans writing code. When you think about it this way, that simple blog you pay $20 a month for was built — and is actively maintained — by thousands of people.

So what?

For me, three main insights come to mind. First, software developers simply cannot build applications without relying on a preponderance of other people’s work. One open source project I built for the Drupal content management system, Layout Builder Restrictions, is in active use on 12,000 websites. Let’s say it’s taken me 100 hours to write and maintain that code. If the developers of each of those 12,000 websites had to build the same functionality as Layout Builder Restrictions, individually, just that tiny bit of functionality would have added weeks of work to the timeline of each of those sites: 1.2 million hours of total developer time.

Second: put that first point the other way: the fact that so many developers’ work is provided open source means that we can quickly build many applications that do many different things: 12,000 websites can be built in a fraction of the time it would take developers working in isolation. (And if you’ll forgive the pretentious meta moment, I would point out that my Composer Dependency Tree Generator, itself, is simply an implementation of the open source visualization tool, D3JS. It took me only hundreds of lines of code to write because someone else had already written thousands.)

And third (the inescapable and scary implication of the previous points): given the amount of code it takes to build a modern web application, and given that code is necessarily built by many different people, a single development team on a single project simply cannot review and vet every line of code. Scanning the Crow dependency tree above, I’ve read far less than 1 percent of its total code. There is no other way to say it: the way we build software — the only way we can build software — carries with it inherent vulnerability.

But there’s good news. I may not have looked at the code personally but many other open source contributors have. And when problems are discovered — bugs or security flaws — developers collaborate to fix the problems and make those fixes available to the rest of us. And that’s why I think it’s so important not just to use open source software but to contribute. Return to that project I built, Layout Builder Restrictions. If I made that software proprietary and charged those 12,000 websites $20 a month, I’d be rich enough to retire. But then: if I had to pay $20 a month for the myriad packages my applications depend on, I’d be filing for bankruptcy.

So: the open source methodology works because of deep interdependencies, and open-sourcing creates a virtuous cycle through participation and proliferation. When I can share my labor back to the community, all things considered, it’s a bargain.

Screen capture showing the Humanities Without Walls grand research challenge.
Screen capture of the HWW Grand Research Challenge

Summer is in full swing for the Crow team. Crow was recently awarded a seed grant from Purdue’s College of Liberal Arts. This grant provides funding for summer work that will help the team prepare to write an application to the third Humanities Without Walls Grand Research Challenge this fall.

As part of the work this summer, we will begin the process of updating the Crow website. We’re looking to enhance our site’s usability, design, content, and features so that everyone who accesses it can get the most out of the resources it offers.

Since HWW this year stresses reciprocity and redistribution, we will also review our past Crow workshops (and attendee responses to them) to give us more insight about how our work gives back to the academic communities we interact with.

Finally, we will be expanding Crow outreach, working to partner with a number of diverse institutions.

It’s going to be an exciting few months, so stay tuned.

Building on the success of her previous Crow workshops engaging the R environment, Dr. Adriana Picoral will host “Quantitative Language Data Analysis and Visualization in R” on April 17, 2021, from 9:00am to 11:00am (Arizona/US/MST). (See this in your time zone.)

In this workshop, we will work with the Variable that data from Tagliamonte’s book “Variationist Sociolinguistics” (2012). We will visualize frequency of that complementizer omission across speaker groups, and run both linear and logistic regression. Concepts such as correlation, interaction, and contrasts will be addressed.

Interested?

  1. Please register for the workshop.
  2. Download and install the latest R version from https://cran.r-project.org
  3. Download and install the latest RStudio from https://rstudio.com/products/rstudio/download/#download 

If you are unable to attend, watch for a video on the Crow YouTube channel the week following the workshop.

Questions? Please contact Dr. Picoral.