BootCat Top Tip

If you use BootCat here is a command to help you separate the collected corpus into individual files using the CURRENT URL line as a separator in a regex:

awk ‘/CURRENT URL/{g++} { print $0 > g”.txt”}’ corpus.txt

Be careful when copy pasting this command into your command line that the apostrophe ‘ is straight and not curly.

related:

BootCat custom URL: [https://eflnotes.wordpress.com/2014/10/08/building-your-own-corpus-bootcat/]

BootCat seeding: [http://blog.englishup.me/2014/02/16/bootcat-seeding/]

Leave a Reply

Your email address will not be published. Required fields are marked *