Note: If you use Tesseract 4 or later, it is highly recommended to use pdfsandwich 0.1.7 or later, as Tesseract may freeze when called in multiple threads. For optimally scanned pdf files, this can be switched off by option -nopreproc to speed up processing. For instance, slightly rotated pages are automatically straightened and dark edges removed. By default, pdfsandwich runs unpaper to enhance the readability of scanned pages and to improve OCR. While pdfsandwich works with any version of tesseract from version 3.0 on, tesseract 3.03 or later is recommended for best performance. It supports parallel processing on multiprocessor systems. It is known to run on Unix systems and has been tested on Linux and MacOS X. It is able to recognize the page layout even for multicolumn text.Įssentially, pdfsandwich is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract. Pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. pdf files which contain only images (no text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. Pdfsandwich generates "sandwich" OCR pdf files, i.e. ![]() We've already got the dependencies for Evince, so assuming you are still in the poppler directory go back up to home with cd or to wherever you want to download Evince.Pdfsandwich pdfsandwich: A tool to make "sandwich" OCR pdf files If you use sudo make install you can still uninstall at any time by entering the source directory (so keep it!) and typing sudo make uninstall Evince If you ever want to uninstall this, you can conveniently do so with sudo dpkg -r poppler as checkinstall will politely inform you. When it's done, you can use sudo make install but even better, you can use checkinstall to make this installation known to dpkg (yay!) so: sudo apt install checkinstall If it exits without errors you can run: make If that doesn't work, try searching online for the error message. ![]() The errors might be illuminating eg 'thing-you-need not found' in which case you can try sudo apt install thing-you-need and try again. Here you will get errors if I missed anything from my list of dependencies above. If you are really keen on tidiness, you can make a new directory for the two source directories you are going to end up with, for example mkdir poppler and enter it: cd poppler.įirst download the encoding files (no need to compile these) to the current working directory wget Įxtract (it does untar cleanly): tar -xf poppler-data-0.4.7.tar.gzĮnter the directory cd poppler-data-0.4.7Īnd magically send the files to the right locations in /usr/share with: sudo make installĭownload & extract the main package: wget ![]() ![]() Open a terminal so you are in your home directory. (more dependencies may be found on other systems but I'm working from a 2-week old installation, so hopefully this will be enough for most) Poppler How to upgrade Poppler & Evince to fix problems opening password-protected PDF filesįirst install all these prerequisites for compiling: sudo apt install g++ autoconf libfontconfig1-dev pkg-config libjpeg-dev libopenjpeg-dev gnome-common libglib2.0-dev gtk-doc-tools libyelp-dev yelp-tools gobject-introspection libsecret-1-dev libnautilus-extension-dev To be able to open password-protected pdfs with Evince on my own system I found (after much testing) that I had to compile the latest release of Poppler from source and also compile the latest release of Evince, building it against the newer Poppler.
0 Comments
Leave a Reply. |