Corpus and Repository of Writing

Kickoff: Methodology Workshop at Purdue

The Crow team recently concluded a four-day workshop series, Methodology Workshop for Natural Language Programming. The workshops, led by Crow researcher and software developer Mark Fullmer, were designed to equip our team members with the fundamental coding and programming skills needed to construct our own programs and troubleshoot problems we encounter in existing scripts. By obtaining a functional knowledge of programming, we can meet our goals to make Crow sustainable and increase team member contribution to corpus- and interface-building tasks. At the end of the week, we expected all Crow members to (1) build a working vocabulary of coding terms; (2) progress past the introductory threshold of programming; and (3) better understand and articulate programming challenges we encounter as we integrate our corpus and repository.

Crow programmer Mark Fullmer presenting to researchers

Mark Fullmer opening the technical workshop

To maximize our learning and productivity during the rest of the week, Mark led the Crow team in an assessment of our current programming skills, identifying what threshold of competency each person wished to achieve by the conclusion of the workshops, and establishing a framework for researchers to form their own personal learning objectives. Mark gave us a checklist of coding tasks to measure against our current programing knowledge and help us compare our progress against a list of definable expectations. Talking over the tasks we were already performing in Crow revealed the varying levels of coding experience among team members, and Mark encouraged us to pick and choose workshops that we would find most useful. During our brainstorming, we created a running document listing the different aspects of programming we found most difficult and specific problems we had encountered. Crow researchers continually updated this document and others throughout the workshop, and we’ll be sharing them soon.

After evaluating our programming competence and articulating our short and long-term goals for the workshops, Mark gave us a preview of the week’s work. The three mantras for the rest of the week were: (1) text processing is recursive and will almost always require future modification; (2) code is an inherently disposable entity that we use to accomplish a specific task; (3) if it isn’t documented, then it doesn’t exist in code.

Participation by our collaborators at Arizona was facilitated by Google Hangouts on Air, a fabulous tool which also records videos of the workshops we can review, edit, and post online.

Over the next month or so, we’ll offer a series of posts which recap the workshop and help us think about ways to develop it into a resource which the Crow community can use as we work together to build the Crow web interface.

Promoting citation research at AAAL 2018

Researchers from the Crow team presented “Citation practices of L2 writers in first-year writing courses: form, function, and connection with pedagogical materials” at AAAL 2018. The presenters were Wendy Jie Gao, Lindsey Macdonald, Zhaozhe Wang, Adriana Picoral and Dr. Shelley Staples.

Crow Citation Team at AAAL 2018

Dr. Shelley Staples, Adriana Picoral, Lindsey Macdonald, Wendy Jie Gao, and Zhaozhe Wang


Citation practices and styles are integral to academic writing contexts. Previous research on citation use has focused on variability across citation form (e.g., integral/non-integral) and function (e.g., synthesis/summary) (Charles, 2006; Petric, 2007; Swales, 2014). However, most studies have focused on advanced L1 English student and professional writing. In addition, no studies to date have investigated the influence of instructor materials on students’ citation practices. Using a corpus of L2 writing, we examined (1) how the L2 writers’ citations vary in form and function across different assignments and instructors; (2) how students’ citation practices might be influenced by the pedagogical materials provided for each assignment.

Our corpus includes 74 papers (72,395 words) across two assignments, a literature review (LR) and a research paper (RP), from a first-year writing course for L2 writers. We calculated the number of citations and references in each assignment (per 1,000 words), and coded citations for integral, non-integral or hybrid (integral and non-integral) forms. We then coded citation functions based on Petric (2007) and qualitatively examined the relationship of the writing to pedagogical materials, such as the number of sources required and the form and function of citations in sample papers.

Our preliminary results show that the writers most frequently use integral citations with little synthesizing function. While there is a large variation in the number of citations both within assignments and across instructors (LR: 3.54-9.73, RP: 5.25-7.05), the number of references is more consistent in the literature review (LR: 2.66-3.27, RP: 3.72-4.88). Students prefer a citation style of non-quote to quote. Integral citation is more frequently used in the literature review, while non-integral citation appears more in the research paper. Hybrid citation form is consistently in existence almost across all sections. These results might be attributed to instructors’ use of model literature review papers that almost exclusively feature integral citations, as well as explicit requirements (3 sources) in the assignment sheets. Attribution only is the largest category for rhetorical functions of all the citations. In addition, students’ awareness of establishing links between sources and making statement of use seem to have been influenced by sample papers. Our findings show the potential need for more instruction on the use of sources for synthesizing information, and the important influence of pedagogical materials.

Citation project conference handout (PDF).

Selected References

Charles, M. (2006). Phraseological patterns in reporting clauses used in citation: A corpus-based study of theses in two disciplines. English for Specific Purposes25(3), 310–331. doi:10.1016/j.esp.2005.053

Lee, J. J., Hitchcock, C. & Casal, J. E., (2018), Citation practices of L2 university students in first-year writing: Form, function and stance. English for Specific Purposes, 33, 1-11.

Petrić, B. (2007). Rhetorical functions of citations in high- and low-rated master’s theses. Journal of English for Academic Purposes, 6(3), 238-253.

Swales, J. (2014). Variation in citational practice in a corpus of student biology papers from parenthetical plonking to intertextual storytelling. Written Communication, 31(1), 118–141. doi:10.1177/0741088313515166

Friday Tech Talk on Word And Phrase

On February 23, 2018, members of the University of Arizona Corpus Lab, Dr. Shelley Staples and Adriana Picoral, held a Friday Tech Talk demonstrating the Word And Phrase application.   The focus of these weekly talks, which are organized by the iSpace at University of Arizona, is on eliciting conversations around different types of digital tools. The targeted tool for this workshop (Word and Phrase) pulls data from the BYU Corpora (in English, Spanish, and Portuguese), allowing users to search new and pre-existing texts, color coding each word based on its frequency.  There are three frequency ranges that the application searches for based on word usage within the corpora; 1-500 (blue), 501-3000 (green), and >3000 (yellow).  

(Academic text sample color-coded by word frequency.)

Word frequency can also be separated by genre: Spoken, Fiction, Magazine, Newspaper, and Academic.  This feature allows instructors to illustrate to their students which types of speech appear in which genres; for example the pronoun ‘I’ is found more frequently in Spoken and Fiction genres, as opposed to Academic writing where it is least likely to be used.  The application identifies the part of speech, ranking, frequency, collocates, and synonyms for each word within the top 3000 words frequency range; Word and Phrase allows students to explore when and how to use specific words or phrases based on information from the BYU corpora as well as other resources (such as Wordnet).

(Frequency of the pronoun ‘I’ across genres.)

(Ranking and frequency of the word ‘say’ as each possible PoS (Part of Speech).)

(Concordance lines of the word ‘tell’ as collocations, providing definition, PoS, and synonyms.)

Participants gave positive feedback on synonyms provided by the word search tool, where more and less frequent synonyms to the search word are displayed with some information on meaning variation provided. They also noted that with these tools, students are able to access the program on their own for autonomous learning.

Here’s our handout on using Word And Phrase.

For information on other Tech Talks organized by the iSpace at University of Arizona, please visit

Tagged with:

Crow Workshop: Integrating AntConc into Teacher Curriculum

By Kelly Marshall and the AZ Crow Team

On February 17, 2018, the University of Arizona Corpus Lab hosted an introductory workshop on how to use AntConc at the 17th Annual SLAT Interdisciplinary Roundtable. The workshop was lead by Adriana Picoral, Nicole Schmidt, Curtis Green, and Shelley Staples, with help from Kelly Marshall, Ali Yaylali, Nik Kirstein, and Yingliang Liu. For this workshop, we changed the layout of our last workshop to better fit the needs and purposes of the attendees at this conference. The first notable change was the use of two different corpora: Arizona Second Language Writing Corpus (ASLW) (part of Crow) and Spanish Learner Language Oral Corpora (SPLLOC). The components we used from the ASLW corpus included Narrative and Rhetorical Analysis student-written papers, while the components we used from the SPLLOC corpus were Modern Times Narrative and Photo Interview files. The goal of the workshop was to help instructors understand how to use AntConc, and how to integrate the application and results into their pedagogy. This was different from our last workshop presentation (given Nov. 21, 2017) where we focused exclusively on the ASLW (Crow) since our audience for that workshop was instructors in the UA Writing Program.

Other differences included the space the workshop was in as well as the activities. The workshop was hosted in one of the computer labs in the Modern Languages building. This room allowed for all workshop participants to interact, learn, and explore the AntConc program instead of having to share with another participant like last time. However, since the time slot was only an hour and fifteen minutes (rather than the hour and forty five minutes allotted last semester), we condensed the workshop by covering terms during the activities rather than presenting them at the beginning. The other aspect that was condensed was the number of activities participants completed, from five activities to three. This was also done to allow participants, like last time, to independently explore the program, interact with one another, and ask us questions they had after completing the activities.

Before the workshop, we ensured all computers had the AntConc application and the appropriate corpora files in Spanish and English were downloaded. This allowed us to save time and start the workshop promptly, without having to spend the first part of the session instructing participants how to download and access the files and program. This pre-workshop preparation process was necessary because we did not know who the participants were in advance (so we were unable to contact them with instructions on how to access the data). In the future, our corpus data will be more easily accessible through a website, which will facilitate this process.

During the workshop, participants were taught how to hide tags so personal, instructor, and other course related information included in the student papers were not displayed in the results.  It should be noted that a potential problem with hiding tags is that the output will be limited in the concordance function. Although we did not introduce this issue at the beginning of the workshop, we showed participants how to solve this problem when we presented activities using the concordance function (i.e., unhide tags if more text is desired). The activities focused on instructing participants to search for specific words or N-Grams (contiguous sequences of words, e.g., 1-gram, 2-gram, 3-gram), and how to see these in a list, in the Word List function, or as key words in context (KWIC) in the Concordance Function.   

(KWIC concordance results with tags included.)

(KWIC concordance results with tags hidden.)

When searching in the concordance window, those in the workshop were taught how to select window size, and to search by frequency, range, or word.  Using the KWIC search shows the words 1, 2, or 3 places left or right of the key word. In addition, participants were taught how to search by prefixes and suffixes, or locate citations by searching “(*)”.

(N-Grams sorted by range to show the most common n-grams across all uploaded files.)

While there were notable differences between the two workshops, both had the underlying goal of providing instructors a new approach to create materials and illustrate the pragmatic use of lexical items and grammar in order to show their students the contexts and patterns of words within a specific genre. Moreover, throughout both workshops, we asked participants questions and had a conversation with participants regarding how AntConc could be used to provide authentic writing examples and address common error patterns.

The workshop concluded with a discussion, first in small groups and then with the entire group, about how these methods translate into lessons. The teachers were given time to reflect on how they might use what they had learned in their own pedagogy.

Here’s our AntConc handout from the workshop.

Tagged with:

Citation Study Update

Members of the interdisciplinary Crow team have been working on what we’ve been calling internally our “Citation Project” since the Summer of 2017. This name is our homage to The Citation Project conducted by Rebecca Moore Howard and Sandra Jamieson.

Wendy Jie Gao, Lindsey Macdonald, and Terrence Zhaozhe Wang videoconference with Shelley Staples and Adriana Picoral.

When the project research was first presented at Corpus Linguistics in 2017, it was titled ”Variability in Citation Practices of Developing L2 writers in First-Year Writing Courses”.  The purpose of the study can be stated as follows: “By examining L2 students’ citation practices in their assignments (Literature Review and Research Paper) for an introductory writing course, we explored their preference for particular citation styles and possible variance across assignments and instructors.”

At the current time, our research focuses on what we’re calling citations and non-citations, as well as the various forms and functions of the citations students are using in two genres: literature reviews and argumentative essays. All of the documents used for the project are from the Purdue Crow Second Language Writing corpus, and a total of 132 papers and 147,000 words have been analyzed. We are examining many different styles of citations, including quote and non-quote, as well as integral and non-integral. An integral citation includes the author’s or article’s name in the sentence being cited. For a non-integral citation, the author’s or article’s stated name is in parenthesis at the end of the sentence. A non-citation doesn’t explicitly state the name of the author or article.

Our findings revealed that students use more citations in a research paper than a literature review and they have a preference for integral citations especially in a literature review. Most importantly, we discovered student’s work is highly framed around sample papers that the instructors provide for students.

Our team plans on presenting their research on March 27 at the AAAL 2018 conference (9:10 to 9:40am, Arkansas Room). We hope to grow the amount of documents which are a part of the project in order to expand the knowledge it can provide.

Spotlighting Crow Undergrad Interns

The Crow team is composed of a variety of different scholars at many different levels of academia from many different fields.  Crow includes various professors of writing, ESL, EAL, SLAT, and many other areas of English and language.  On top of this, Crow also includes three undergraduate interns which broadly expands their experience by introducing them to many workplace aspects such as a collaborative work environment, research opportunities, and more! Each of the undergraduate interns became a part of crow for different reasons and hope to further pursue their academic career through the experience gained here. Below each intern explains how they first became involved with Crow and what experiences they hope to gain from this internship opportunity. 


Nik Kirstein: Nik Kirstein is a junior in Information Science.  He first got interested in Crow after working with a corpus to analyze the Russian language.  Crow helps Nik gain experience in text and data processing and has introduced him to some corpus informatics applications such as AntConc.   All of this ties into information science very well.  Nik hopes to gain more experience in data visualization and back end database development with corpus data.  He wants to work in the CyberSecurity Industry one day.


A picture of Blair NewtonBlair Newton: Blair Newton is a senior in Professional Writing. She first heard about Crow from her Intro to Professional Writing professor, Dr. Michael Salvo. The internship opportunity appealed to her because of how much varying experience she would be exposed to that classes couldn’t offer. Blair does research, blog posts, grant writing, and graphic design for Crow. She hopes one day to combine writing and marketing as a career and eventually even write a novel.



A picture of Jessica KuklaJessica Kukla: Jessica Kukla is a senior professional writing major on the editing and publishing track at Michigan  State. While writing and editing is her forte, Jessica has a growing interest in technical writing and information and experience architecture, which lead her to working with Crow. She hopes to gain more experience with grant writing and working with corpus data. After MSU, Jessica hopes to pursue higher education in something along the lines of information architecture.


By Nik Kirstein, Blair Newton, & Jessica Kukla 

Arizona AntConc Workshop 2017

On November 10th, 2017, the University of Arizona Corpus Lab held its AntConc Workshop.  AntConc is an application that allows users to view useful information about a text such as the word frequency, placement of search term in the text, and more. The main goal of the workshop was to help instructors 1) to develop an understanding of how to use Crow and AntConc to address language awareness within their writing classroom, and 2) to understand the value of using students’ writing. Corpus data offers a new way to look at learning English as a second language. Using a corpus, instructors can see what common mistakes their students make or what patterns of language are more common in certain genres, and then create activities based around them. For example, when the instructors searched for parentheses, the locations of the citations within the papers could be seen in the concordance plot. This helped instructors see how their students were using citations and whether or not they were being used correctly.  Another idea during the workshop was comparing the use of the word “like” in written papers versus spoken English. The differences in writing and speech help us understand how these students are learning and understanding English. The workshop was a great success and the materials as well as a video cast will be available soon.

Photo from AntConc Workshop


Symposium CFP published!

Today we hit a pretty significant milestone for the Crow project: we’ve published the CFP for our symposium, October 4–6, 2018. We’re thrilled to be able to host an event focusing on the type of work we want to support with Crow: data-driven writing research which recognizes the value of interdisciplinarity and the importance of collaboration. Our symposium team has done a great job planning, so we wanted to share some of the assumptions shaping the symposium which may not be apparent from the CFP.

First and foremost, we imagine an inclusive event where everyone feels welcomed, valued, and invited into the conversations which make conferences so rewarding. To that end, we’ll be featuring undergraduate research in several ways. We’ll offer travel support targeting new and under-represented voices. And we’ll have a Code of Conduct which makes clear our expectations for mutual respect and inclusive behavior.

All of us agree that the best parts of academic gatherings are the conversations they facilitate. So expect coffee breaks throughout the symposium, and plenty of time to get from one session to another. Poster sessions will feature refreshments, too. We’re picking spaces which we hope will offer plenty of chances to sit down and talk with friends and colleagues.

We also want to keep costs down. Support from Humanities Without Walls will help quite a bit. Staff from Purdue’s College of Liberal Arts are helping us find spaces which meet our needs affordably. We won’t add unnecessarily to registration costs if we can avoid it. For example, dinners will be “on your own,” so it’ll be easier for attendees to find places to eat which fit their budgets. We’ll be asking sponsors to directly support subsidies which lower our operating costs—rather than paying for tote bags.

Finally, we’re really excited about our keynote speakers, Shondel Nero and Susan M. Conrad. They represent two traditions of writing research distinct from our research focuses with Crow. We look forward to learning from them and we know you will too.  

Over the past year, Beril, Shelley, Bill, and I have gladly helped our symposium team as they have brainstormed about the type of event they wanted to host, then drafted and published our CFP. From the start, we’ve imagined Crow as driven by and for the scholarly and professional interests of our students, and that’s certainly true here as well. Our thanks to everyone on the Crow team, especially Lauren, Lindsey, Michelle, Hadi, Ashley, Blair, and Terrence. The work you see in this CFP is theirs. Expect to hear more from them soon.

We hope to see you in West Lafayette in October 2018. Before then, we welcome your questions:

Tagged with: , , , ,

Arizona Summit 2017

Members of the Crow team standing in front of a classroom building

Crow Team Leaders from Purdue and University of Arizona, Collaborators from Northern Arizona University & University of South Carolina, Online Interface Developer, and Graduate Student Researchers

The Crow team recently held a research summit at the University of Arizona in Tucson. Faculty and students from Purdue and South Carolina gathered for two days, with Crow researchers joining from West Lafayette and Michigan as well. In this post, graduate researchers Adriana Picoral, Ashley Velázquez and Hadi Banat describe their experiences.

Participating in the Arizona Summit 2017 was an eye opening experience and a valuable professional development opportunity that does not replicate itself frequently due to the dominant nature of conferences as professional development venues in academia. We found this unique experience fruitful because we were involved in different phases of professionalization: planning and decision making, grant writing, institutional culture, research and pedagogical discussions, mentoring, and collaboration.

Graduate school offers a spectrum of experiences, but seminars, research, and conferences are not all what we need to become successful faculty members and engage in our discourse community of scholars. Graduate seminars do not prepare us to experience a real work culture and do not offer ample chances for building professional skills that help us survive the demanding and rigorous nature of academia as a profession.

What we found most helpful in this summit was recognizing the significance of rhetorical listening as a prerequisite skill for successful collaboration. We closely observed how the team leaders were giving chances to each other, to us as graduate students, to undergraduate students, to potential collaborators, and to institutional staff to talk and express various points of view. They were generously and attentively listening to figure out what takeaways would most help Crow grow and prosper in terms of data collection, site expansion, research methodologies, best infrastructure practices, interface prototyping and development, and winning grants.

Crow team members collaborating around a table

Round-table Studio Work, Planning, & Decision Making

The round-table type of discussions and workshop nature of the summit have placed us as equals i.e. all perspectives matter because a successful team is one that relies on different levels of expertise and a variety of skills. We have observed the purposeful choice of collaborators and how some partnerships are more effective than others when considering the long term plans of Crow. We have learned that setting priorities and meeting short term goals scaffold to achieve larger objectives and long term sustainability.

What was eye opening was the level of preparation that the team gets involved in prior to grant writing. The division of labor, calendar planning, team formation, and communication with institutional centralized administration prepare a team to win a grant. It is not the actual writing of the grant which is most challenging. It is the balance that we create in terms of team member expertise, the alignment between the nature of the project and grant, and figuring out all the pieces of the puzzle. This intricate process of grant writing is most successful when it is done collaboratively and mindfully.

We hope that Crow sets a model for different institutions to engage in collaborative and interinstitutional interdisciplinary research projects to engage undergraduate and graduate students in experiences that can help them grow professionally and prepare them better for the real challenges they will encounter on the job. Attending plenaries and conference presentations promote visibility and diversity of perspectives, but participating in summits involve graduate students in the elaborate bits and pieces of the life of a faculty on the job and its everyday practices.

By Hadi Banat, Adriana Picoral, & Ashley Velázquez

Crow at Corpus Linguistics 2017

From Crow team member Wendy Jie Gao:

Photo shows two women, Dr. Shelley Staples and Wendy Jie Gao standing on either side of their research poster at the 2017 Corpus Linguistics conference.

Wendy Jie Gao and Dr. Shelley Staples at Corpus Linguistics 2017

In July, Crow researchers gave a poster presentation at the Corpus Linguistics (CL 2017) in Birmingham, England.

The poster introduced our citation project initiated in the summer of 2016–”Variability in Citation Practices of Developing L2 writers in First-Year Writing Courses”.  By examining L2 students’ citation practices in their assignments (Literature Review and Research Paper) for an introductory writing course, we explored their preference for particular citation styles and possible variance across assignments and instructors.

Preliminary research results show a more frequent use of integral citation as well as a hybrid citation pattern. Pedagogical materials used by instructors may also play a role in influencing students’ citation practices. We received important feedback to help us move the project forward. 

Question: Every discipline and profession has its own writing guidelines or conventions. Why do we need to look at students’ writing in introductory composition classes?

Response: The first-year writing course is aimed at helping students adapt into academic writing genres, which might not be familiar to international undergraduate students. Because first year writing is a required course for colleges and universities in the United States, it could help students navigate through the long tunnel of writing processes, including writing in different genres, for multiple purposes and audiences, as well as professional writing in their future.

  • This question reminds the research team that first-year writing can be a new concept to those who are not familiar with the U.S. higher education setting. We need to have more clarification of the background information for future conversations

Question: What will be the next step after categorizing citation practices based on formal features (integral citation, non-integral citation and hybrid citation)?

Response: We are planning to focus on functional coding as the next step. Our literature review covers related research such as Omizo and Hart-Davidson (2016) and Petrić (2007). Closer analysis will reveal rhetorical functions intended by student writers, such as attribution, exemplification, or extraction.

Question:  What does you mean by “students’ citation practices might be influenced by pedagogical materials”?

Response: We collected pedagogical materials used by the three instructors and have noticed some connection. For example, most of the students used more citations in the Literature Review assignment. Two instructors have made it an explicit requirement that their writing needs to include at least three citations, which is not a “must-do” for the other assignment. Summarizing and evaluating are the focus for the assignment of literature review, while the research paper emphasizes more on argumentation. This helps in explaining a higher number  of integral citation while students are writing a literature review.

The citation project research team has revised our proposal and submitted it for AAAL 2018 (Applied Linguistics Conference). All these thought-provoking questions  and feedback are indispensable to the progress of our research in the future.