Corpus and Repository of Writing

The Crow team has been busy this past year! Between graduations, winning grants, publications, and personal projects, Crowbirds have generated a significant list of accomplishments and have lots of mentors to thank. We’ve already shared news of our four graduating Crowbirds—Ryan Day, Ji-young Shin, Ali Yaylali, and Larissa Goulart.

Continuing on to other Crowbirds around the world, please join us in congratulating and recognizing the following individuals for their work during the 2021-2022 academic year. We do lots of collaborative work, so we’ve listed our 23 publications and 18 presentations at the end of this post.

Hadi Banat, a Crowbird at UMass Boston, was invited to facilitate a paid workshop for Farnham Writers’ Center tutors in training at Colby College this past year. He utilized the conflict resolution framework in his most recent publication in Praxis Journal to mentor tutors on how to identify and resolve conflicts which could come up in tutoring sessions. Hadi also participated in a year-long Junior Faculty Research Seminar at UMass Boston which mentors faculty from across campus on tenure requirements and builds a sense of community among junior faculty on the tenure track. He would like to thank Dr. Michelle McMullin for leading the Constructive Distributed Work team through another manuscript project which is bringing more visibility to Crow within a new discourse community, technical communication. 

Mariana Centanin Bertho from the University of Arizona received the 2022 Research Award by National Federation of Modern Languages Teachers Associations and National Council of Less Commonly Taught Languages, the SLAT Linda Waugh Research Award, and the Graduate and Professional Student Council GPSC Research and Project Grant in April 2022. She will use the funding for participants’ compensation and transcription services to collect and process spoken data for MACAWS. Mariana also defended her dissertation proposal, “Oral development in L3 Portuguese by English-Spanish bilinguals,” on April 25, 2022. She would like to recognize Dr. Bruna Sommer-Farias and Dr. Shelley Staples for being wonderful mentors by providing valuable feedback and encouraging Mariana with grant writing and co-authorship of potential papers and presentations.

Jianfen Chen took on the role of graduate instructor of professional writing at Purdue University this academic year. She was awarded a seed grant from CILMAR, the Center for Intercultural Learning, Mentorship, Assessment and Research of Purdue University, and both the Crouse Emergent Scholars Award from the Professional Writing Program and the Professional Writing Award for Exceptional Instruction from the English Department of Purdue University. Jianfen received a Graduate Fellowship from Purdue’s Graduate School Mentoring Graduate Student Fellow Program and the CCCC Scholars for the Dream award. 

Jianfen also defended the prospectus for her dissertation, “Who Can We Listen to Amid the Risks and Uncertainties of COVID-19?” in December 2021. She was invited to share this research with the STC student chapter at Texas Tech University. Jianfen writes, “Dr. Bradley Dilger has always been a valuable resource to me and has helped me by offering insightful feedback on my dissertation prospectus and serving on my dissertation committee.”

Bradley Dilger at Purdue University is excited that the Purdue ‘birds won a Covid–19 disruption grant, which allowed several undergraduate researchers to join the Crow team, and a HWW seed grant in the 2021-2022 academic year. Bradley is also celebrating his promotion to Professor at Purdue University. Throughout the year, Bradley chaired three doctoral committees, served on seven more, and directed three undergraduate research projects, too. He would like to mention that the appointment of Aleksandra Swatek and Hadi Banat to the leadership team has been a great addition to Crow.

Anuj Gupta at the University of Arizona was selected as the DS2F Fellow (Digital Scholarship and Data Science Fellow) by the University of Arizona Library in 2022. This carries an honorarium of $1,500, mentorship to develop data science skills, and an opportunity to curate a workshop to help bolster data science efforts in the humanities and social sciences. Anuj also won the Tilly Warnock Award from the University of Arizona’s Writing Program which provides him with summer funding for $3,500 to analyze data from his project on graduate students’ academic writing anxiety. 

He would like to thank both Shelley Staples and Bradley Dilger for being excellent and kind advisors in his first semester working with Crow. Anuj also would like to note he has benefited from observing Aleksandra Swatek and Mark Fullmer and reading research proposals written by Nina Conrad, Hui Wang, and Emily Palese. Anuj has had an eventful year: he got Covid, broke his femur bone, survived a shooting at a hotel that killed one person and injured three people—all during Spring 2022—and still initiated his first ever large scale data collection and analysis! Talk about resilience.

Ge Lan at the City University of Hong Kong received the CityU Start-up Grant in December 2021. Ge also turned in an application for the General Research Fund which is a prestigious grant in Hong Kong. 

Michelle McMullin at North Carolina State University (NCSU), was reappointed after a successful third year pre-tenure review. She presented at the NCSU Equity Symposium alongside Hadi Banat and Aleksandra Swatek. Michelle would like to thank the CDW team for their work on the upcoming infrastructure article that will be published in CDQ in October 2022 and recognize Ola and Hadi for their work on the equity symposium presentation.

Adriana Picoral, University of Arizona has won a Public Interest Technology University Network (PIT-UN) grant for $90,000. Her project focuses on spreading awareness of data science both in its academic possibilities and career paths by working to educate individuals about the field and support a diverse array of students.

Anna Shura was a published author of multiple poems in Purdue’s literary magazine The Bell Tower where she also served as editor. She was president of the Purdue Professional Writing Association and is looking forward to taking on the additional role of Vice President of the Student English Association in the fall. Anna will be an intern for Goodheart-Willcox Publishers in summer 2022, and she would like to thank Dr. Dilger for his support and encouragement during her first year working for Crow.

Shelley Staples at University of Arizona would like to highlight Dr. Randi Reppen, who has been her mentor since 2009, for helping her prepare a book proposal this year. Dr. Staples published numerous articles and gave several invited talks and conference presentations. 

Aleksandra Swatek at the Adam Mickiewicz University in Poznań, Poland would like to recognize the welcoming atmosphere in the Crow’s Constructive Distributed Work team which she has joined in Spring 2022, as well as thank for the opportunity to join the Crow leadership team. She would also like to thank Dr. Dilger and Dr. Staples for the support and advice in navigating her early-career scholar career trajectory. 

Shelton Weech from Purdue University conducted workshops and wrote and produced videos on data visualization and rhetoric for Krannert School of Management, Purdue University. He also won the Crouse Emergent Scholar Award. Shelton would like to thank Bradley Dilger, Michelle McMullin, and Hadi Banat for their encouragement, advice, and mentorship as he enters the final stretch of his PhD.

Publications & invited talks

  1. Banat, H. (2022). Crossing through borderlines of identification and non-identification: Transforming writing center faculty response to faculty outreach workshops. Praxis Journal, 19(1). https://www.praxisuwc.com/191-banat
  2. Banat, H., Sims, R., Tran, P., Panahi, P., & Dilger, B. (2022). Developing intercultural competence through a linked course model curriculum: Mainstream and L2-specific first-year writing. TESOL Journal, 13(1), 1–16. https://doi.org/10.1002/tesj.613
  3. Biber, D., Gray, B., Staples, S., & Egbert, J. (2022). The register-functional approach to grammatical complexity: Theoretical foundation, descriptive research findings, and applications. Routledge.
  4. Dang, A., Conrad, N., & Staples, S. (2022, February). Adapting corpus-based materials for online teaching in L2 writing courses. SLW news: The newsletter of the second language writing interest section (TESOL). http://newsmanager.commpartners.com/tesolslwis/print/2022-02-09/3.html
  5. Dilger, B., Dryer, D., Bazerman, C., Anson, C., & Lerner, N. (2021). Conclusion. In K. Blewett, T. Donahue, & C. Monroe (Eds), The expanding universe of writing studies: Higher education writing research (pp. 417–420). Writing and Rhetoric Series. Peter Lang.
  6. Ginosian, K., & Gupta, A. (2022). Digital Reading in the FYC Classroom, WPA-CompPile Research Bibliographies, No. 29. WPA-CompPile Research Bibliographies. https://wac.colostate.edu/docs/comppile/wpa/Ginosian-Gupta.pdf 
  7. Gupta, A. (2021). Emotions in academic writing/care-work in academia: Notes towards a repositioning of academic labor in India (& beyond). Academic Labour: Research and Artistry, 5, 107–136.
  8. Gupta, A., & Dasgupta, A. (2021). Something of our own to say: Writing pedagogy in India. Composition Studies, 49(3), 139–144.
  9. Huensch, A., & Staples, S. (2022). Spoken corpora. In T. Derwing, M. Munro, & R. Thompson (Eds.), Handbook of SLA and L2 Speaking (pp. 112–129). Routledge. 
  10. Lan, G., Zhang, Q., Lucas, K., Sun, Y., & Gao, J. (2022). A corpus-based investigation on noun phrase complexity in L1 and L2 English writing. English for Specific Purposes, 67, 4–17. https://doi.org/10.1016/j.esp.2022.02.002 
  11. McMullin, M., Banat, H., Weech, S., & Dilger, B. (2021). Using iterative persona development to support assessment and research. Proceedings of the 39th ACM International Conference on Design of Communication (ACM SIGDOC 2021). Article 29, 1–8.
  12. McMullin, M., & Dilger, B. (2021). Constructive distributed work: An approach to sustainable collaboration and research for distributed teams. Journal of Business & Technical Communication, 35(4). 469–495.
  13. Pandey, S., & Chen, J. (2021). Is Facebook easier to use than WeChat? A critical comparative analysis of interface features of WeChat and Facebook. In Proceedings of The 39th ACM International Conference on Design of Communication (ACM SIGDOC 2021). Association for Computing Machinery, New York, NY, USA, 213–223. https://doi.org/10.1145/3472714.3473644
  14. Rodríguez-Fuentes, R. A., & Swatek, A. (2021). Exploring the effect of corpus-informed and conventional homework materials on fostering EFL students’ grammatical construction learning. System, 104, 102676. https://doi.org/10.1016/j.system.2021.102676
  15. Shin, J. (2022). Investigating and optimizing score dependability of a local ITA speaking test across language groups: A generalizability theory approach. Language Testing, 39(2), 313–337. https://doi.org/10.1177/02655322211052680 
  16. Shin, J., Rodríguez-Fuentes, R. A., Swatek, A., & Ginther, A. (2022). Aptis test review. Language Testing, 39(1), 172–187. https://doi.org/10.1177/02655322211032873 
  17. Staples, S., Dilger, B., Picoral, A., Novikov, A., & Goulart, L. (2021, December). CIABATTA: Corpus in a Box: Automated Tools, Tutorials, and Advising. (online). https://writecrow.org/CIABATTA/ 
  18. Staples, S., Picoral, A., Novikov, A., & Sommer-Farias, B. (2022). Directions for future use of existing corpora in the study of L2 writing. In C. Polio & R. Manchon (Eds.), Handbook of SLA and Writing (pp. 356-369). Routledge.
  19. Staples, S. (2021, December). Enhancing pedagogy with data-driven learning: Student writing as the site and source for instructional change. Invited talk for the Center for University Education Scholarship (CUES) at University of Arizona (hybrid). https://www.youtube.com/watch?v=1fDrbrk03Ac 
  20. Staples, S. (2021, November). Introduction to corpus linguistics for English language teaching [Invited talk, online]. Universitas Muhammadiyah Malang, East Java, Indonesia.
  21. Staples, S. (2022, April). Using learner corpora and data driven learning for second language writing: A report from the Corpus and Repository of Writing (Crow) project [Invited talk]. English Applied Linguistics Speaker Series, University of Arizona, Tucson, Arizona, USA.
  22. Sun, Y., & Lan, G. (2021). Research trends in ‘trans-’ studies on writing: A bibliometric analysis. System, 103, [102640]. https://doi.org/10.1016/j.system.2021.102640
  23. Swatek, A., Taskin, Z. & N.C. Jackson. (2022). Revisiting “Family Matters”: How Citation Patterns in the Journal of Second Language Writing Reveal the Changing Nature of the Second Language Writing Field and the Decreasing Role of Composition & Rhetoric in IT. Journal of Writing Analytics, 6(1), 145–165. https://doi.org/10.37514/JWA-J.2022.6.1.06

Conference presentations & workshops

  1. Banat, H. (2022, April). Identity, writing centers, and conflict resolution [Workshop, online]. Farnham’s Writing Center, Colby College, Waterville, ME, USA.
  2. Castek, J., Dupuy, B., Hellmich, E., MacKay, K., & Staples, S. (2021, September). CERCLL Resources [Panel presentation, online]. Arizona Language Association (AZLA).
  3. Chen, J. (2021, October). A case study on public affects circulating in Hong Kong 2019 chaos via YouTube videos [Poster presentation for Student Research Competition]. The 39th ACM International Conference on Design of Communication (ACM SIGDOC 2021), Tempe, AZ, USA. 
  4. Chen, J. (2022, April). Teaching business writing students to conduct market research using data analytics tools [Paper presentation, online]. SpeedCon 2022, North Carolina State University, Raleigh, NC, USA.
  5. Chen, J., & Smith, J. (2021, October). “Public Relations Media Kit (PRMK)”: A collaborative project in professional writing [Paper presentation, online]. Association for Business Communication 2021 Annual International Conference. 
  6. Chen, J., Tang, Y., Zhang, J., & Xie, C. (2022, March). A case study on teaching experiences and practices of Chinese graduate instructors of writing at four American institutions [Panel presentation, online]. 2022 Conference on College Composition and Communication. 
  7. Gupta, A. (2022, January). In pursuit of Academic Writing Anxiety (AWA): Reflections on research design(s) [Paper presentation]. Conference on Writing and Well Being, University of Arizona, AZ, USA.
  8. Gupta, A., Dang, A., & Rodrigo, R.L. (2022, January). Technology, reading, and strategies to reduce anxiety [Panel presentation]. Conference on Writing and Well Being, University of Arizona, AZ, USA.
  9. McMullin, M., Swatek, A., & Banat, H. (2022, February). Constructive Distributed Work: Building ethics, equity, and access in research teams [Panel presentation]. NCSU Equity Research Symposium. Raleigh, NC, USA. https://vimeo.com/675577595
  10. Sommer-Farias, B., Centanin-Bertho, M., Vinokurova, V., & Staples, S. (2021, November). How to teach writing with learner corpus data [Paper presentation, online]. American Council of Teachers of Foreign Languages (ACTFL). 
  11. Staples, S. (2021, September). Introduction to Crow for Microcampus faculty [Outreach workshop, online]. University of Arizona, Tucson, AZ, USA.
  12. Staples, S., & Ghanam, R. (2022, February). Spoken corpora: Considerations and tools for corpus analysis of speech data [Invited talk, online]. Northern Arizona University, Flagstaff, AZ, USA.
  13. Weech, S. (2021, October). Inherently organizational: Speech acts, organizations, and mission statements [Paper presentation, online]. Association for Business Communication Conference.
  14. Weech, S. (2022, March). Snowballs and interviews: A mixed methods approach for online research [Paper presentation, online]. Conference on College Composition and Communication.
  15. Yaylali, A. (2022, April). Adolescent ELs’ developing awareness of secondary scientific writing. [Roundtable presentation]. AERA Meeting, San Diego, CA, USA.
  16. Yaylali, A. (2022, March). How ELs negotiate the linguistic demands in scientific writing. [Paper presentation]. TESOL Convention, Pittsburgh, PA, USA.
  17. Yan, X., Staples, S., Centanin-Bertho, M., & Chuang, P. L. (2022, March). Exploring linguistic correlates of speaking ability on the IELTS speaking test [Paper presentation]. American Association of Applied Linguistics (AAAL), Pittsburgh, PA, USA.
  18. Yan, X., Staples, S., Centanin-Bertho, M., Chuang, P. L., Cai, H., Jang, S., & Wang, S. (2022, March). Exploring linguistic correlates of speaking ability on the IELTS speaking test [Paper presentation]. Language Testing Research Colloquium (LTRC).

The Crow team is pleased to congratulate the following four Crowbirds on their graduation. All of them are exemplary individuals both here at Crow and at their home institutions, and we are excited to celebrate their accomplishments and look forward to seeing what they do next! Please join us in congratulating Ryan Day, Ji-young Shin, Ali Yaylali, and Larissa Goulart:

Ryan Day has earned his BS in Civil Engineering and a BA in Political Science from Purdue University. His future plans include attending law school at Indiana University Maurer School of Law in Bloomington, Indiana.

Ji-young Shin graduated in December 2021 with a Ph.D. from the Second Language Studies/ESL program in the English Department at Purdue University after defending her dissertation: “Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approaches”.  She now works as an Assistant Professor at the University of Toronto Mississauga.

Ali Yaylali has completed his doctorate in Language, Reading, and Culture from the University of Arizona after defending his dissertation on April 4th, 2022. He is looking forward to being an Assistant Professor in Education at Eastern Kentucky University. 

Larissa Goulart has received her PhD in Applied Linguistics from Northern Arizona University after completing her dissertation: “Situational and Linguistic Variation in University Writing for Content Classes.” She is excited to start as an Assistant Professor in Linguistics at Montclair State University next fall.

The Crow corpus has thousands of undergraduate writing assignments written by real students. When these students agreed to have their writing become a part of our platform, they did so under the assumption all their identifying information would be wiped away before their writing becomes public. 

In order to ensure all traces of the writer’s identity (as well as their teachers, classmates, etc.) have been removed, we go through each text replacing names, places, course names, positions, and more with placeholder tokens. To make this process less mind-numbing, our wonderful developers have devised a series of tools that automate parts of the process. 

Some examples of placeholders in a corpus text

Crow developers wrote a script that automatically deletes the header of each document, since this will almost always have the student’s name, their teacher’s name and other identifying information. 

Next, a second tool automatically highlights capitalized words within the document because they are likely identifying proper nouns. This allows the reviewer to spot them quickly and determine whether or not they need to be replaced. While it’s essential to get rid of identifying information, it’s also important to keep in as much detail as possible and avoid detracting from the writer’s original message and intention. This can sometimes be a balancing act for the reviewer.

This is a screenshot of the tool we use to de-identify texts. Note how the words in purple have been replaced by placeholder tokens but how “Associate Professor” and “Purdue Polytechnic Institute” were intentionally left in because those names alone won’t identify anyone and they give context to a future reader.

Cut to 2022, and nearly four years of undergraduate writing (over 4,500 documents!) have piled up un-de-identified. At Purdue, 2022 undergraduate researchers now have more than our fair share of undergrad writing to read and de-identify. We’ve been steadily working through files with guidance from others on the Crow team. Even with the tools we’ve made, de-identification remains labor intensive, so we’re grateful for the grant support that’s made our current work possible.

In our next post, we’ll detail how our team has been working on improvements to the de-identification tool to make it more efficient and more accurately retain writers’ intentions.

Tagged with:

We are pleased to announce Crow has added two longtime Crow researchers to our leadership team: Dr. Aleksandra Swatek, Adam Mickiewicz University in Poznań, Poland, and Dr. Hadi Banat, University of Massachusetts, Boston.


Our history with Dr. Swatek and Dr. Banat goes back to the first days of Crow at Purdue University—and in Dr. Swatek’s case, even farther, as she was involved with some of the corpus-building projects at Purdue that preceded Crow. Both have contributed to many key projects at Crow, such as the development of our first Humanities Without Walls grant, the 2018 “Writing Research Without Walls” symposium, and ACLS-supported outreach activities. They have hosted workshops, contributed to grant writing, and helped undergraduate and graduate researchers develop their own projects.

One of the strengths of Crow has been our interdisciplinary approach, and Dr. Swatek and Dr. Banat have helped us achieve intellectual diversity with their expertise in second language studies, applied linguistics, technical communication, and other fields. Recent co-authored publications demonstrate that diversity, with Dr. Swatek’s “Revisiting ‘Family Matters’” in the Journal of Writing Analytics and Dr. Banat’s “Developing intercultural competence through a linked course model curriculum” in TESOL Journal, both written with other research teams that, like us, appreciate their ability to write collaboratively.

Currently both are actively involved in our Constructive Distributed Work project, which has several active publications in progress. We look forward to seeing how they shape the future of Crow. 

The Crow Lab at Purdue University has seen significant growth over the last few months. As of February 2022, five interns are currently working on our project, thanks to support from a EVPRP grant. Studio Hours, or time to work in the lab in Heavilon Hall or over Zoom, occur several times over the course of the week. Fridays are a reflective studio hour time, hosted by Bradley Dilger, for all interns to share their progress with each other and ask questions. The Crow lab is open for interns to use at any time, which is especially useful for one-on-one conferencing.

Though almost all of the work for Crow is collaborative, everyone contributes to a specific set of projects.

I’m Hannah Brostrom, an undergraduate at Purdue. I have been a part of several projects, recently my focus has been on finishing the environmental scans we’ll use to improve the Crow website and platform, and de-identifying documents. I have also collaborated with a fellow intern, Vivek, to improve the de-identification tool (a program we use to help us anonymize the documents we use to build our corpus), and am working on (this!) blog post for the Crow website, and I’ll be doing more writing after Spring Break.

Bradley Dilger, an Associate Professor at Purdue, guides all Crow projects and assists everyone by answering questions and sharing his experience with writing research. He has been leading the grant writing projects, helping Vivek and I with our coding for the De-ID tool, and working with the Distributed Work Team. Outside of Crow, Bradley teaches professional writing and mentors student researchers.

Abby Elkin, an undergraduate researcher at Purdue, has been helping with the Innovation Grant. Additionally, she has been de-identifying and helping me with the conducting environmental scans to improve the Crow website. Abby is also writing a blog post about the improvements Vivek and I are planning for improvements to the De-ID tool, once that work is completed.

Shelton Weech, a graduate teaching assistant at Purdue, has contributed heavily to the Distributed Work team through various writing projects, workshop projects, and the development of future grants. Outside of Crow, Shelton has been preparing for his dissertation conducting interviews with scientific communicators who use social media, all of which he transcribes and analyzes.

Vivek Natarajan, an undergraduate researcher at Purdue, has focused on all aspects of de-identification. He has been doing the de-identification work itself, as well as improving the tool we use for de-identification. Recently, he has fixed an issue where the edited version of a file was saving incorrectly, and he is building some code adjustments to improve user experience while using the tool. 

Anna Shura, an undergraduate researcher at Purdue, has been focused on a variety of projects. She has written several Spotlights and Blog posts, as well as leading Crow’s social media presence. Anna is creating and documenting a Twitter and Web content strategy that includes a blog posting schedule, writing articles, and developing a stronger Twitter profile. She has also worked on organizing Crow’s Google drive, de-identification, and reviewing the processes behind developing the Innovation Grant.

Having lab space in Heavilon allows us to tangibly interact with one another and get help from the people around us when needed. This has been hugely beneficial, especially considering how collaborative Crow’s necessities are. The EVPRP grant has given us the necessary tools to work with each other efficiently and for the Crow interns to move forward not only with the Crow project, but also to learn and grow in their positions.

Dr. Aleksey Novikov has always had a passion for learning languages and reflects as a young child, “[I liked] that I was able to understand things that others didn’t.” Despite growing up and attending college in Russia, Aleksey knew “English was going to be a part of my career” and switched from math, business, and economics to pursue a new degree in translation and interpretation. As a translator, Aleksey was able to apply his love of foreign language, but “he wanted something more creative.” In working with a psycholinguist professor as an undergraduate, Aleksey was inspired to study mechanisms associated with speech perception and investigate “mechanisms at work when people hear and understand spoken language.”

Aleksey Novikov

After gaining his undergraduate degree, Aleksey sought several different career experiences as he jumped between working at an IT company and an evening teaching job because, as a true academic, Dr. Novikov knew, “I need intellectual fulfillment.” He began teaching English to foreign language students who had day jobs and could not take regular classes, and it was through Aleksey’s time late-night teaching he learned about the Fulbright Scholarship Program

The Fulbright Program awards grants to support English Teaching Assistant Programs. Fulbrighters are English teachers who may teach students in their native languages, and Aleksey taught Russian in a self-directed language program at California State University in 2009 on a Fulbright Scholarship. Throughout his Fulbright Program, he took two classes as a non-degree student: Foundations of TESOL and American Studies, which he was able to transfer to graduate credit four years later at the University of Arizona.

Dr. Novikov earned both his MA in Russian and Slavic Studies and his PhD in Second Language Acquisition and Teaching at the University of Arizona, while teaching Russian and linguistics and working as a research assistant for Crow, too. He was ahead of the game by designing a mostly asynchronous Russian language course with synchronous conversation components pre-pandemic. In May 2021, Dr. Novikov earned his PhD, defending his dissertation: Syntactic and Morphological Complexity Measures As Markers of L2 Development in Russian.

Here at Crow, Aleksey has contributed his diverse background experience to a number of projects including Python and corpus building workshops, recruitment and organization of student work for the repository, and as a Primary Investigator on the multilingual academic corpus of assignments (MACAWS). Most recently, Dr. Novikov piloted Crow’s program CIABATTA in Russian and built a significant amount of the CIABATTA infrastructure.

Aleksey is incredibly passionate about educating students in linguistic studies and mentions, “I appreciate teaching and introducing people to linguistics. That is just the best moment of my life.” As of 2022, Dr. Novikov is a Visiting Assistant Professor at the Oxford College of Emory University, where he continues to teach several linguistics courses as well as Quantitative Theory and Methods, an introduction to data and statistics.

As a Crowbird, Dr. Novikov is looking forward to planning a CIABATTA workshop in the Fall 2022 and continuing to work as a PI on MACAWS, including updating the IRB. Needless to say, Aleksey likes to stay busy and intellectually curious, and Crow is excited to celebrate his work and wish him all the best in his upcoming projects!

The Crow team is excited to continue to grow with the addition of new interns across three different institutions. Vivek Natarajan is an Undergraduate Research Assistant at Purdue University, Anuj Gupta is a Graduate Intern at the University of Arizona, and Faisa Aden is an Undergraduate Research Assistant at North Carolina State University. All three interns engage in a diverse array of degrees from English to Computer Science, and Crow looks forward to their involvement across all of our projects. Below, each new researcher shares their scholarly backgrounds and goals for their work as new Crowbirds.

Vivek Natarajan is a junior at Purdue University studying Computer Science with minors in Math and Linguistics. His current interests include Natural Language Processing and Machine Learning. In addition to English, he also speaks Spanish, Hindi, Tamil and some Sanskrit and Telugu, due to living in India for six years. Here at Crow, Vivek will be working on de-identification, the development of the corpus and repository, and improving the tools we use for that important work.

Anuj Gupta is a graduate student at the University of Arizona in the Rhetoric, Composition & the Teaching of English program. He works at the intersection of composition studies, applied linguistics and digital humanities. Using computational & corpus approaches, he wishes to explore how media texts use emotions to rhetorically persuade the public. Through Crow, he is excited to learn text mining techniques and build resources that will enable students to experiment with these methods for their research. He is also looking forward to getting involved with publications and presentations about Constructive Distributed Work.

Faisa Aden

Faisa Aden is a Sophomore at NC State, studying Psychology and English with a concentration in Linguistics. In the future, she hopes to study abroad and either teach English as a Second Language, or work in the publishing industry as a technical writer or journalist. On campus, she is involved with Social Innovations Fellows, McNair Scholars, and University Scholars.

Outside of the Crow lab, all three of our accomplished interns explore unique hobbies. Vivek is an avid blouderer and likes trying to find the hardest possible way to climb up steep rock races outside. Anuj plays guitar, reads sci-fi novels, and passes his time thinking about stuff that happens in outer space. Faisa enjoys reading, watching Turkish IV shows, and spending time with her family. Congratulations to all three Crowbirds. Welcome to our team!

The Crow team is glad to be presenting at the NC State University Equity Research Symposium on Tuesday, February 8. Crow researcher Dr. Michelle McMullin, assistant professor of English at NC State, will be joined by Crow co-PIs Dr. Hadi Banat (U of Massachusetts, Boston) and Dr. Aleksandra Swatek (Adam Mickiewicz University, Poznań, Poland).

We’ll be presenting “Constructive Distributed Work: How Crow builds ethics, equity and access in research teams,” which introduces our team and our corpus and repository platform, describes the “Constructive Distributed Work” heuristic we use to guide mentoring and professional development on our team, and shows how user experience design helps us design more sustainable, ethically robust tools for teaching and research.

If you’re in Raleigh, our talk will be in Talley 4280 from 3:00 to 4:00pm Eastern time on Friday, February 8. NCSU will be live-streaming as well. (Find this time in your time zone.)

Emily Palese has been a leader of the Crow repository since 2019 and earned her PhD in Second Language Acquisition and Teaching from the University of Arizona in May of 2021. Emily previously studied Spanish and Anthropology at the University of Wisconsin-Madison as an undergraduate and earned her MA in Teaching English as a Second Language from the University of Arizona. As a member of the Peace Corps for two years, Dr. Palese taught English in the Philippines at a rural high school, and she also facilitated training workshops for elementary and high school teachers from other parts of the country. Here at Crow, Emily has contributed to a diverse array of projects. Collaborative work has been important to Emily for a long time, and she gains team experience from her work on the Crow repository, AZTESOL, the Second Language Writing collaboratives, and WriPACA.

Emily’s dissertation, Prompting students to write: Designing and using second language writing assignment prompts, for her recent PhD investigates how assignment prompts function in first-year writing courses at the University of Arizona. Dr. Palese’s motivation for this project came from her own experience, “When I was new to the university, I struggled to understand as a young professional, what are the expectations?” Emily immediately realized, “having a framework for analyzing prompts and [being able to] compare what you’re doing is really helpful when you’re designing new materials,” and she began her research of prompt interaction by collecting and reviewing prompts for Crow’s repository.

When Dr. Palese brought her research to her 18 student participants, she studied “how students are interacting with the materials, what they’re skipping when they’re reading, [and] what they think is important.” Emily conducted “think aloud” interviews and described her process, “As the students interacted with the prompts for the first time, I screen recorded with audio to see how they navigated [the prompts and] what their thoughts and reactions were as they looked at them. Immediately after, I had a semi-structured interview with each of the students to follow up on what they valued and how they used the prompts.” Additionally, Dr. Palese studied the rhetorical moves that occur in assignment prompts to understand how instructors give directions. Her analysis of writing is complemented by interviews of six instructors and observations of their courses. 

After earning her PhD in 2021, Emily became the Assistant Director of Global Foundations Writing at the University of Arizona, where she “provides instructional support for global micro-campuses, including onboarding and supporting instructors, developing materials, and assessing and adapting curricula.” Here at Crow, Dr. Palese finalized her work on the repository team and began preparing to transition to her new leadership position. Currently, Emily is enjoying exploring her new role and reflects, “I’m happy that the repository has new leadership and members so our original ideas and protocol can get refined with new perspectives.” 

We wish Dr. Palese well with all of her 2022 endeavors!

Crow receives significant interest from students and faculty in building their own corpora. Many people interested in corpus building are unsure where to start. How can data be organized effectively? How can participants be contacted and treated ethically? The Crow team hopes to  answer these questions by providing a Corpus in a Box: Automated Tools, Tutorials and Advising, or CIABATTA. The December 6th, 2021 CIABATTA launch introduced the “Corpus in a Box” to an international audience with participants from Lebanon, Colombia, Hong Kong, Italy, Greece, Saudi Arabia, United Kingdom, Canada, Ghana, Brazil, United States, and Poland.

The launch event began with Dr. Shelley Staples describing the content included in CIABATTA and the motivation behind the development of the corpus building process. While we created CIABATTA to help scholars begin their own corpora, Staples pointed out it is important to recognize that if you need to build a corpus, “It’s a lot of work!” If you decide building a corpus would be helpful to your research, CIABATTA has put together a start up process for anyone looking to build their own corpus. 

Building CIABATTA has allowed the Crow team to pool our experiences and contribute programming, using automated tools, and user experience guidelines. However, coding experience and research experience is not necessary to use CIABATTA. As Staples described it, CIABATTA is designed for students and faculty around the world: “from novice users looking to begin conducting data analysis through their corpus to experienced programmers ready to streamline their own processes for corpus building,” as the CIABATTA web page notes. 

CIABATTA includes several main goals: 

  1. best practices for corpus building
  2. ethical issues in corpus building
  3. checking consents and collecting data
  4. organizing your data
  5. converting, encoding, and standardizing your data
  6. organizing, preparing & processing metadata
  7. adding headers and changing filenames
  8. deidentifying your data

Attendees of the launch presented a variety of motivations for using CIABATTA, with several participants asking about using CIABATTA in academic courses and piloting CIABATTA in different languages. We encourage these uses and supported these goals in the Q&A section of the launch:

Screenshot of CIABATTA launch, showing "CIABATTA content" with list of the nine sections of content: (1) best practices for corpus building; (2) CIABATTA overview; (3) ethical issues in corpus building; (4) checking consents and collecting data; (5) organizing your data; (6) converting, encoding, and standardizing your data; (7) organizing, preparing & processing metadata; (8) adding headers and changing filenames; and (9) deidentifying your data.
The nine sections of CIABATTA content (also in the list above)

In building CIABATTA, we chose GitHub as the presentation platform because of its ability to integrate code and text from the GitHub wiki. Through GitHub, users are directly linked to the most recent data code and automated tools. In response to one participant’s question, “Could you convert CIABATTA into a textbook?” Staples and Dr. Adriana Picoral encouraged using CIABATTA or other Crow information to share with a class. 

One attendee asked if CIABATTA could help build corpora in languages other than English. The answer is yes! The Crow team has successfully piloted the Corpus in a Box in Portuguese and Russian through the Multilingual Academic Corpus of Assignments: Writing & Speech (MACAWS), and encouraged attendees interested in piloting other languages to work with Crow to offer feedback

Another important question in the Q&A section asked about CIABATTA as opposed to other programs, such as Lancsbox. Crow Team member Dr. Aleksandra Swatek answered the comparison by noting, “Lancsbox is more to analyze the corpus … CIABATTA helps to compile the corpus and all the other steps you need to prepare your files.”

In the CIABATTA Open House on December 7, 2021, ACLS program officer Dr. John Paul Christy asked about ethical concerns in corpus building, pointing out that the public turn in ACLS work has highlighted issues about the co-creation of knowledge. We shared some experiences across Crow. Dr. Bradley Dilger described his decision to defer recruiting corpus participants while he was an administrator at Purdue. Dr. Staples described our original plans for building the repository, which included posting identified materials as a way to recognize and potentially reward instructors for their participation. However, we realized doing so could result in identification of students through triangulation. This is one reason we sponsored the Crow Writing Contest at Arizona — to recognize our students’ good work without identifying their contributions to the corpus. 

Our next steps for CIABATTA include user experience testing with targeted groups such as the Crow Fellows, users of Crow, and developers and researchers using the Crow code on GitHub. If you use CIABATTA, we’d love to hear from you! Join our mailing list to stay up to date and offer your feedback, if you wish.

If you are interested in CIABATTA and were unable to attend the Launch or Open House, additional CIABATTA information can be found on CIABATTA’s GitHub and the Crow YouTube channel. We welcome your questions about CIABATTA. Just send us a note to collaborate@writecrow.org.