A question arises as to what search terms to use, i.e. what are frequent terms that we could use in class? In order to get a list of frequent terms one can use the Spokes interface to the Spoken BNC (British National Corpus).[http://pelcra.clarin-pl.eu/SpokesBNC/#explore/kp/0/100]
It has a feature that lists all the common formulae such as the following top 3:
thank you very much
i don’t know
One now has a nice base in order to explore the tools mentioned initially. Of course one must bear in mind that the BNC is dated so current conversational English will not be represented well.
Thanks to Cara Leopold @eltinfrance for prompting this note.
although overall frequencies are low it is clear that UK English has significant uses of in the street while US English has significant uses of on the street
added to the (speculative) fact that the video maker [see previous comment about context] is probably more used to UK English then the issue that the video flags on the street as an error is somewhat more understandable
She points out that part of the appeal of magazine articles on science is that abstracts in journals “are incredibly densely packed and require a certain degree of skill to decode.”
The PLOS (Public Library of Science) website asks authors to write an author summary which is “Distinct from the scientific abstract, the Author Summary is included in the article to make findings accessible to an audience of both scientists and non-scientists.”
This presents a possible halfway house for EAP students. The PLOS abstracts are restricted to mainly biology and medical domains and not all papers have author summaries.
One could simply copy paste abstracts and author summaries from the web pages. Or one could semi-automate this.
There is a nice scraper called quickscrape [https://github.com/ContentMine/quickscrape] which allows you to download articles from various journals. Follow the instructions on the github site to set it up and to understand the quickscrape commands. The configuration for plos journals can be modified so that you only need to download the abstracts.
In the journal-scrapers/scrapers/plos.json file modify the file like so:
In programs like Excel there are limits to the number of rows. This is troublesome if you have data that uses more than this limit. Instead of splitting your data over several sheets one option is to use Jupyter Python notebooks.
There have been some interesting links shared by #corpusmoocers which members may find of interest, note this is just a sample of what I find interesting, at the close of the mooc I will put up the full list I am collecting.