Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: PDF files



Ludwik asked:

> > 5) Is it possible to turn a pdf file back into an editable text?
> > If so then please explain how. Print+Scan+OCR (optical
> > character recognition) is too demanding for this.

Chuck Britton wrote:

> 'DeCoding' a pdf file can be done with the full version of Adobe
> Acrobat.

Hmmmm. It depends. PDF is sort of a catch-all format,
various formats mixed together. You can have text
represented as text, in which case it is relatively
easy to "decode" into plain text ... or you could have
a bitmap, which could be a line drawing or an _image
of text_ in which case nothing short of OCR will do.

If you're going to OCR it, printing and scanning is
silly. Use Ghostscript or the like to render the
PDF directly into .tiff or some other standard open
bitmap format that you can feed to the OCR engine.

Also note that if you are talking about a PDF you
found on the web, it is likely that Google has
already made available the text reformatted as HTML.