extraction - Automatic deletion of some pdf contents -


i've got extract specific data high number of .pdf files. problem first thing have convert .pdf .txt can find data i'm interested in. after conversion there's high amount of artefacts in .txt files ( page numbers, hyperlinks contents page, footers, headers etc. ). these .pdf files quite huge ones ( every single file transcription of 7-12 hours of people talking ) cannot afford deleting things manually ( i've got ~60 .pdf files ). question - know tool allows automatic deletion of such contents?

i'd glad hear every proposition improve work :) thanks!


Comments

Popular posts from this blog

wireshark - USB mapping with python -

c++ - nodejs socket.io closes connection before upgrading to websocket -

Deploying Qt Application on Android is really slow? -