BootCat Top Tip

If you use BootCat here is a command to help you separate the collected corpus into individual files using the CURRENT URL line as a separator in a regex:

awk ‘/CURRENT URL/{g++} { print $0 > g”.txt”}’ corpus.txt

Be careful when copy pasting this command into your command line that the apostrophe ‘ is straight and not curly.


BootCat custom URL: []

BootCat seeding: []

Leave a Reply

Your email address will not be published. Required fields are marked *