Corpus and Repository of Writing

We are thrilled to introduce our first cohort of Crow Fellows

  • Olayemi Awotayo, Graduate Instructor, Virginia Polytechnic Institute (Virginia Tech)
  • Dr. Madelyn Pawlowski, Assistant Professor of English, Northern Michigan University
  • Margaret Poncin Reeves, Senior Lecturer, DePaul University
  • Modupe Yusuf, Doctoral Candidate, Michigan Technological University (Michigan Tech)

Later this summer, we will share more about this outstanding group of teacher-scholars. We are grateful to everyone who applied, and especially to the American Council of Learned Societies (ACLS) for the support that makes our Fellows program possible. 

On behalf of the Crow team, I would like to take this opportunity to congratulate all of our graduates, who accomplished so much in this year that was so challenging. Six Crowbirds earned degrees from the University of Arizona: 

Anh Dang graduated with an MA in Teaching English as a Second Language (TESL) in May 2021. She will continue at Arizona as a PhD Student in Second Language Acquisition & Teaching (SLAT), with an assistantship in the UA Foundations Writing Program

Hannah Gill graduated with a double major in Philosophy, Politics, Economics and Law (PPEL) and English in May 2021. She completed an honors thesis, “English Language Learners within the Classroom: Improving K-12 Policy and Enhancing Curriculum Through Corpus Based Instruction.” In Fall 2021, she will begin a Master of Social Work (MSW) in the Mandel School of Applied Social Sciences at Case Western Reserve University. 

Jhonatan Henao-Muñoz earned an M.A. in Hispanic Linguistics, (Winter, 2020), an M.A. in French Linguistics and Second Language Teaching & Learning (Spring 2021), and a Graduate Certificate in Technology in Second Language Teaching (Spring 2021). He has accepted a position as Instructor of French at the University of Arizona.

Alantis Houpt graduated with a degree in English, and a Teaching English as a Foreign Language (TEFL) certification from the Center for English as a Second Language (CESL), in December 2020.

Dr. Aleksey Novikov earned a PhD in Second Language Acquisition and Teaching in May 2021, defending his dissertation “Syntactic and Morphological Complexity Measures as Markers of L2 Development in Russian” on May 7. 

Dr. Emily Palese also earned a PhD in Second Language Acquisition and Teaching in May 2021, defending her dissertation “Prompting Students to Write: Designing and Using Second Language Writing Assignment Prompts” on May 19.  

Kevin Sanchez graduated with a double major in English and Creative Writing in May 2021. He’s finishing up his TEFL certificate and getting ready to teach English abroad. 

Our best wishes to Anh, Hannah, Jhonatan, Atlantis, Aleksey, Emily, and Kevin! We look forward to seeing your next moves. Next week, we will have more to say about the individual accomplishments of everyone on the Crow team. 

Dr. Aleksey Novikov, Dr. Shelley Staples, and Dr. Emily Palese (left to right) at University of Arizona Commencement, May 2021

The web interface for Crow, the corpus and repository of writing, depends on a complex amalgam of interdependent bits of code built by thousands of people.

Five minute read.

A few years ago Thomas Thwaites decided to make a toaster. From scratch. Armed with the breadth of extant human knowledge (thanks, internet!), after a bit of petroleum refining here, a bit of iron smelting there, Thwaites would assemble some things into a bigger thing that would be capable of toasting bread. Surely a simple task. It wasn’t. Thwaites’ exploration into the institutional knowledge and global dependencies that go into a seemingly trivial kitchen appliance makes visible the complexities of something we take for granted. Thwaites shows the real cost of a $20 toaster.

Most web applications I build can’t make your breakfast, but like your average toaster, they have a deep system of prerequisites and dependencies. And even though I do software development full-time—even though my job is to understand what’s going on behind the scenes—I take for granted how much my software relies on the work of so many others.

Like practically all software today, Crow’s online corpus and repository of writing leverages many other software packages — bundles of code that provide discrete services such as sending and receiving data over the web, reading from and writing to a database, rendering tables and forms and charts. Caching. Authenticating. Validating. In the case of Crow’s software, there are also corpus-specific libraries for normalizing, tokenizing, lemmatizing, indexing, querying, filtering, excerpting, and highlighting.

Those parts of the Crow code were created (and are actively maintained) by other developers. Not me. The software I build just sits on top of it, connecting the dots to make purpose out of possibility. And part of my development time is simply keeping Crow code updated with changes in those dependencies, changes which can take the form of bug fixes, security patches, and new features.

To help me with those updates, I built a tool to visualize my code’s dependencies: the Composer Dependency Tree Generator.

Package Management

A bit of background: most contemporary software uses package management tools to define and retrieve all of the bits of those external software libraries that are needed. In the case of the PHP programming language, that package management tool is Composer. Package managers also help developers identify when those packages have available updates (though testing the updates before applying them is still the work of the developer). Package requirements are stored in a single file which contains all of the building blocks of the application — its DNA. The Composer Dependency Tree Generator takes that DNA file and renders it as a collapsible tree.

As visualized below, the Crow software backend requires 36 packages in order for me to write the “real” code of the application (click image to interact).

Maybe 36 dependencies for a single project seems reasonable, even manageable. But as you probably already guessed, those packages depend on other packages. Zoom in on one small but critical part of the application: user authentication. Crow depends on Simple OAuth to send user credentials through the internet securely and verify that the user is who they say they are. OAuth depends on other package (for instance, league/oauth_server implements the OAuth 2.0 standard for handling access tokens and refresh tokens), which in turn depend on other libraries like defuse/php-encryption to encrypt data using the OpenSSL protocol, which is, itself, a dependency. The fractal nature of dependencies quickly comes into focus (select to enlarge):

Because of these interrelationships, Crow’s DNA file — not the actual code, mind you, just the list of which building blocks the code needs — is 8,700 lines long. Behold, the fully articulated Crow dependency tree (select to interact):

So: a lot of building blocks. A lot of moving parts. A lot of humans writing code. When you think about it this way, that simple blog you pay $20 a month for was built — and is actively maintained — by thousands of people.

So what?

For me, three main insights come to mind. First, software developers simply cannot build applications without relying on a preponderance of other people’s work. One open source project I built for the Drupal content management system, Layout Builder Restrictions, is in active use on 12,000 websites. Let’s say it’s taken me 100 hours to write and maintain that code. If the developers of each of those 12,000 websites had to build the same functionality as Layout Builder Restrictions, individually, just that tiny bit of functionality would have added weeks of work to the timeline of each of those sites: 1.2 million hours of total developer time.

Second: put that first point the other way: the fact that so many developers’ work is provided open source means that we can quickly build many applications that do many different things: 12,000 websites can be built in a fraction of the time it would take developers working in isolation. (And if you’ll forgive the pretentious meta moment, I would point out that my Composer Dependency Tree Generator, itself, is simply an implementation of the open source visualization tool, D3JS. It took me only hundreds of lines of code to write because someone else had already written thousands.)

And third (the inescapable and scary implication of the previous points): given the amount of code it takes to build a modern web application, and given that code is necessarily built by many different people, a single development team on a single project simply cannot review and vet every line of code. Scanning the Crow dependency tree above, I’ve read far less than 1 percent of its total code. There is no other way to say it: the way we build software — the only way we can build software — carries with it inherent vulnerability.

But there’s good news. I may not have looked at the code personally but many other open source contributors have. And when problems are discovered — bugs or security flaws — developers collaborate to fix the problems and make those fixes available to the rest of us. And that’s why I think it’s so important not just to use open source software but to contribute. Return to that project I built, Layout Builder Restrictions. If I made that software proprietary and charged those 12,000 websites $20 a month, I’d be rich enough to retire. But then: if I had to pay $20 a month for the myriad packages my applications depend on, I’d be filing for bankruptcy.

So: the open source methodology works because of deep interdependencies, and open-sourcing creates a virtuous cycle through participation and proliferation. When I can share my labor back to the community, all things considered, it’s a bargain.

Screen capture showing the Humanities Without Walls grand research challenge.
Screen capture of the HWW Grand Research Challenge

Summer is in full swing for the Crow team. Crow was recently awarded a seed grant from Purdue’s College of Liberal Arts. This grant provides funding for summer work that will help the team prepare to write an application to the third Humanities Without Walls Grand Research Challenge this fall.

As part of the work this summer, we will begin the process of updating the Crow website. We’re looking to enhance our site’s usability, design, content, and features so that everyone who accesses it can get the most out of the resources it offers.

Since HWW this year stresses reciprocity and redistribution, we will also review our past Crow workshops (and attendee responses to them) to give us more insight about how our work gives back to the academic communities we interact with.

Finally, we will be expanding Crow outreach, working to partner with a number of diverse institutions.

It’s going to be an exciting few months, so stay tuned.

Building on the success of her previous Crow workshops engaging the R environment, Dr. Adriana Picoral will host “Quantitative Language Data Analysis and Visualization in R” on April 17, 2021, from 9:00am to 11:00am (Arizona/US/MST). (See this in your time zone.)

In this workshop, we will work with the Variable that data from Tagliamonte’s book “Variationist Sociolinguistics” (2012). We will visualize frequency of that complementizer omission across speaker groups, and run both linear and logistic regression. Concepts such as correlation, interaction, and contrasts will be addressed.

Interested?

  1. Please register for the workshop.
  2. Download and install the latest R version from https://cran.r-project.org
  3. Download and install the latest RStudio from https://rstudio.com/products/rstudio/download/#download 

If you are unable to attend, watch for a video on the Crow YouTube channel the week following the workshop.

Questions? Please contact Dr. Picoral.

We’re happy to share some recent good news from across the Crow team.

Nina Conrad

Congratulations to Crow researcher and University of Arizona doctoral candidate Nina Conrad, who was awarded a Bilinski Fellowship for her dissertation project, “Literacy brokering among students in higher education.” The fellowship will fund three semesters of writing and includes professional development opportunities as well. 

Crow researcher Hannah Gill was admitted to the Mandel School of Applied Social Sciences at Case Western Reserve University, including a scholarship and funding to support her field work. In May, Hannah will graduate from the University of Arizona, with a double major in English and Philosophy, Politics, Law, and Economics (PPLW).

Thank you to everyone who attended our third Crow Workshop Series event, focusing on grant writing. If you were not able to attend, please see the video on our YouTube channel. Our slides and handout are also available. 

We were so pleased by the turnout. Our workshop team (Dr. Adriana Picoral, Dr. Aleksandra Swatek, Dr. Ashley Velázquez, and Dr. Hadi Banat) is reviewing the feedback we got and planning our next event. Stay tuned! 

Dr. Adriana Picoral

Dr. Picoral was awarded a mini-grant for a series of professional development workshops designed to increase the gender inclusivity of the data science programs at the University of Arizona. The workshops will be hosted by Dr. Picoral in cooperation with two invited speakers. 

Ali Yaylali, Aleksey Novikov, and Dr. Banat wrote about Crow’s approach to data driven learning (DDL) in “Using corpus-based materials to teach English in K-12 settings,” published in TESOL’s SLW News for March 2021. This is our second piece for SLW News, following “Applying learner corpus data in second language writing courses,” written by Dr. Velázquez, Nina Conrad, Dr. Shelley Staples, and Kevin Sanchez in October 2020. 

Finally, Dr. Picoral, Dr. Staples, and Dr. Randi Reppen published “Automated annotation of learner English: An evaluation of software tools” in the March 2021 International Journal of Learner Corpus Research. Here’s the abstract:

This paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.

Picoral, A., Staples, S., & Reppen, R. (2021). Automated annotation of learner English: An evaluation of software tools. International Journal of Learner Corpus Research, 7(1), 17–52. https://doi.org/10.1075/ijlcr.20003.pic

We thank all of the Crow researchers and Crow friends who supported this good work, and the editorial teams, reviewers, and funders who made it possible. 

The Crow leadership team would like to express its condemnation of anti-Asian and gender-based violence and to communicate its support for Asian Americans, Asians, and Pacific Islanders. We are saddened and angered by the murders of Delaina Ashley Yaun, Paul Andre Michels, Xiaojie Tan, Daoyou Feng, Hyun Jung Grant, Soon Chung Park, Suncha Kim, and Yong Ae Yue. We condemn the increased violence this past year against Asian and AAPI students, faculty, and individuals at our own institutions, both in the U.S. and abroad.

Our team closely collaborates with our Asian and AAPI team members, and greatly values the contributions of Asian and AAPI students and teachers as participants in our research. We want to express our solidarity with those individuals and others in our own professional networks, and invite others to do the same for their students and colleagues.

Please visit Stop AAPI Hate for more information and resources.

Ashley Velázquez
Shelley Staples
Michelle McMullin
Adriana Picoral
Bradley Dilger
Randi Reppen

The Crow workshop series continues! 

Workshop flyer. PDF version linked, text in main post.
Before you start “writing” flyer. PDF version available.

Fellowship and Grant Writing for PhD Students & Early Career Scholars, Part I: Before You Start “Writing”
In this workshop, we will discuss why you should apply to grants and fellowships (and the difference between these). We will also address how to find grants and fellowships, as well as how to prepare for applying. Designed for early career scholars from around the world who conduct writing research, broadly construed. 

Saturday, March 13, 2021, 9:00 to 10:30 am Pacific/USA 
(UTC: Sat Mar 13, 17:00 to 18:30)

Presenters are Aleksandra Swatek, PhD; Hadi Riad Banat, PhD; Adriana Picoral, PhD; and Ashley J Velázquez, PhD.

Please register for the workshop and share any questions you have beforehand. We hope to see you there! 

The Crow team is excited to be a part of the University of Arizona’s Women’s Hackathon for 2021. We’ll be offering a workshop, “Collaborating online: Lessons from a Successful Team,” on Saturday, March 6, at 1:00pm Mountain time. Michelle McMullin, Shelton Weech, and Bradley Dilger will be facilitating.

Collaborating online: Lessons from a Successful Team
Based on the experiences of an interdisciplinary software design and research team working at multiple sites, we share three principles for collaborative teams who prioritize inclusivity and mutual respect. Examples and practical techniques will help your team work together more effectively both asynchronously and when working together in person.

Our materials:
A video of our presentation, for those unable to attend synchronously. (We just uploaded this, so please pardon the captioning; we will fix that ASAP.)

The slide deck for our presentation is also available.

Finally, an template for the consecutive agenda Crow teams use to combine meeting agendas, notes, and links to our team communication platform.

Our second Crow workshop will be held on December 19, 2020 from 9:00 to 11:00am (Arizona time/MST).

“Corpus Searches in R: Regular Expressions and Concordance Lines” will be hosted by Adriana Picoral, PhD, assistant professor of data science at the University of Arizona. 

Workshop flyer. PDF also available.

Corpus Searches in R: Regular Expressions and Concordance Lines
Saturday, December 19, 9am to 11am (Arizona time/MST).

In this workshop, we will work with a tagged corpus. We will go over the steps of reading in a corpus (organized as multiple text files) in R, doing searches in the corpus using regular expressions, and producing concordance lines. We welcome to this workshop corpus linguists that are not yet familiar with R but interested in expanding their coding skills.

Register through Zoom. For more information, please contact Dr. Picoral.

Did you miss our first workshop? Watch a video on our YouTube channel.

Workshops in Spring 2021 will be announced soon. Got a workshop suggestion? Let us know!