Scan document -> OCR-> Searchable PDF

You can ask general questions, share opinions or advices about doPDF.
anon
Posts: 38
Joined: Tue Jun 03, 2008 8:42 am

Postby anon » Fri Aug 22, 2008 2:22 am

I have scanned in a letter, but I want to create a PDF so that I can post it on a web site, but I also want the PDF to be 'searchable'.


I have used Microsoft Office Document Imaging and it has created a TIFF file and the OCR has been performed. But when I use doPDF, it creates the PDF as an image and not Searchable PDF. So If I do find the word "Dear", it does not find anythign because it treats it like an image.


Any solutions?



Softland
Posts: 1447
Joined: Thu May 23, 2013 7:19 am

Postby Softland » Fri Aug 22, 2008 9:52 am

I don't know if MODI allows you to save in an .rtf format (or .doc) after performing OCR, but if it doesn't you could copy/paste the OCR-ed document into word and convert it with doPDF from there - this should make it searchable. When you convert TIFF to PDF it will be converted as an image, that's why you have to use the rtf solution.



anon
Posts: 38
Joined: Tue Jun 03, 2008 8:42 am

Postby anon » Tue Aug 26, 2008 3:48 pm

Microsoft Office Document Imaging has only option to save as TIFF or .mdi file (2003 ed). So no option to save as RTF file. However, it does have the option to save as a Word file. However, there is a problem with approach is that I loose the entire document e.g letter head. All it does it copy the text. As an example, if you got a letter from the President, you would want to retain the 'original' document e.g. preseve letter head etc...



Softland
Posts: 1447
Joined: Thu May 23, 2013 7:19 am

Postby Softland » Wed Aug 27, 2008 10:50 am

As an example, if you got a letter from the President, you would want to retain the 'original' document e.g. preseve letter head etc


I would say it depends which president you got that from :).


Anyway, I can't think of another solution for having the scanned document converted. It would be tedious, but if you have the text saved in a word document you could include the letter head as an image in the word document (cropping it from your scan image).



kryzstoff
Posts: 5
Joined: Fri Oct 31, 2008 5:31 am

Postby kryzstoff » Sun Nov 02, 2008 4:37 pm

anon; you should look at OpenOffice.org for a free and simple solution -- whilst Sun's flexible suite isn't much to look at, it's every bit as powerful as Microsoft Office and with it's PDF editing features, even more so.


P.S. (of course, doPDF is what you'll need for all your non-office documents, eg. printing from the internet, CAD programs, etcetera :-)



ochin
Posts: 1
Joined: Fri Feb 06, 2009 12:12 pm

Postby ochin » Fri Feb 06, 2009 7:19 pm

I think you have your answer in the following tutorial:

http://www.wac.ohio-state.edu/pdf/scan/pdffromscan.html




Return to “General”