Intro to Digital Research Methods in Linux

The digitization of historical documents offers a wealth of sources to complement traditional archival research. But with so much material available, how can historians process this vast quantity of information efficiently to determine what is relevant to their own project? This series of workshops will progress through the skills required to download large quantities of digitized historical documents from online repositories and introduce basic text-mining techniques to perform simple methods of computational analysis to evaluate the content of those documents. These methods can be readily applied to typed, printed, or transcribed texts.

Workshop 1a – Introduction to VirtualBox

Workshop 1b – Basic Programming and Working with Text Files

Workshop 1c – Batch Downloading and Automation

Workshop 2a – Pattern Matching

Workshop 2b –  Simple Scraping

Workshop 2c – Semi-Structured Data

Workshop 3a – Enhancing images and OCR

Workshop 3b – Working with Books-turned-PDFs

Workshop 3c – Working with Images

Workshop 4 – Binary Solo