Crow receives significant interest from students and faculty in building their own corpora. Many people interested in corpus building are unsure where to start. How can data be organized effectively? How can participants be contacted and treated ethically? The Crow team hopes to answer these questions by providing a Corpus in a Box: Automated Tools, Tutorials and Advising, or CIABATTA. The December 6th, 2021 CIABATTA launch introduced the “Corpus in a Box” to an international audience with participants from Lebanon, Colombia, Hong Kong, Italy, Greece, Saudi Arabia, United Kingdom, Canada, Ghana, Brazil, United States, and Poland.
The launch event began with Dr. Shelley Staples describing the content included in CIABATTA and the motivation behind the development of the corpus building process. While we created CIABATTA to help scholars begin their own corpora, Staples pointed out it is important to recognize that if you need to build a corpus, “It’s a lot of work!” If you decide building a corpus would be helpful to your research, CIABATTA has put together a start up process for anyone looking to build their own corpus.
Building CIABATTA has allowed the Crow team to pool our experiences and contribute programming, using automated tools, and user experience guidelines. However, coding experience and research experience is not necessary to use CIABATTA. As Staples described it, CIABATTA is designed for students and faculty around the world: “from novice users looking to begin conducting data analysis through their corpus to experienced programmers ready to streamline their own processes for corpus building,” as the CIABATTA web page notes.
CIABATTA includes several main goals:
- best practices for corpus building
- ethical issues in corpus building
- checking consents and collecting data
- organizing your data
- converting, encoding, and standardizing your data
- organizing, preparing & processing metadata
- adding headers and changing filenames
- deidentifying your data
Attendees of the launch presented a variety of motivations for using CIABATTA, with several participants asking about using CIABATTA in academic courses and piloting CIABATTA in different languages. We encourage these uses and supported these goals in the Q&A section of the launch:
In building CIABATTA, we chose GitHub as the presentation platform because of its ability to integrate code and text from the GitHub wiki. Through GitHub, users are directly linked to the most recent data code and automated tools. In response to one participant’s question, “Could you convert CIABATTA into a textbook?” Staples and Dr. Adriana Picoral encouraged using CIABATTA or other Crow information to share with a class.
One attendee asked if CIABATTA could help build corpora in languages other than English. The answer is yes! The Crow team has successfully piloted the Corpus in a Box in Portuguese and Russian through the Multilingual Academic Corpus of Assignments: Writing & Speech (MACAWS), and encouraged attendees interested in piloting other languages to work with Crow to offer feedback.
Another important question in the Q&A section asked about CIABATTA as opposed to other programs, such as Lancsbox. Crow Team member Dr. Aleksandra Swatek answered the comparison by noting, “Lancsbox is more to analyze the corpus … CIABATTA helps to compile the corpus and all the other steps you need to prepare your files.”
In the CIABATTA Open House on December 7, 2021, ACLS program officer Dr. John Paul Christy asked about ethical concerns in corpus building, pointing out that the public turn in ACLS work has highlighted issues about the co-creation of knowledge. We shared some experiences across Crow. Dr. Bradley Dilger described his decision to defer recruiting corpus participants while he was an administrator at Purdue. Dr. Staples described our original plans for building the repository, which included posting identified materials as a way to recognize and potentially reward instructors for their participation. However, we realized doing so could result in identification of students through triangulation. This is one reason we sponsored the Crow Writing Contest at Arizona — to recognize our students’ good work without identifying their contributions to the corpus.
Our next steps for CIABATTA include user experience testing with targeted groups such as the Crow Fellows, users of Crow, and developers and researchers using the Crow code on GitHub. If you use CIABATTA, we’d love to hear from you! Join our mailing list to stay up to date and offer your feedback, if you wish.
If you are interested in CIABATTA and were unable to attend the Launch or Open House, additional CIABATTA information can be found on CIABATTA’s GitHub and the Crow YouTube channel. We welcome your questions about CIABATTA. Just send us a note to email@example.com.