Day One: Text Wrangling

We are starting with a wholly unusable PDF File (https://docs.house.gov/billsthisweek/20240318/WDI39597.PDF). First we need to parse it into usable data. Goal 1: Read and parse the document Goal 2: Extract sections. Read and parse the document What are some readily available open source projects that I can use to parse PDFs into text? Let’s try some …