Quick Links

Re: PDF Parsing and Indexing

From:	Mike Castle <dalgoda(at)ix(dot)netcom(dot)com>
To:	PostgreSQL General Listserver <pgsql-general(at)postgresql(dot)org>
Subject:	Re: PDF Parsing and Indexing
Date:	2001-06-16 00:02:03
Message-ID:	20010615170202.I26165@thune.mrc-home.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, Jun 15, 2001 at 07:33:42PM -0400, Doug McNaught wrote:
> "Raymond" <support(at)bigriverinfotech(dot)com> writes:
> > Has anybody had experience in doing this?

Wonder if Google's solution to this is available.

> provides for arbitrary placement of each glyph on the page. So the
> word "this" might be encoded in the file as something like:
>
> moveto(100, 200)
> draw("t")
> moveto(105, 200)
> draw("h")
> moveto(112, 200)
> draw("i")
> moveto(115, 200)
> draw("s")
>
> You can see that it would hard to index something like this in any
> kind of useful way.

PDF's generate from MS utilities (Word I think?) are notoriously bad for
this. Big surprise.

mrc
--
Mike Castle dalgoda(at)ix(dot)netcom(dot)com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

In response to

Re: PDF Parsing and Indexing at 2001-06-15 23:33:42 from Doug McNaught

Responses

Re: PDF Parsing and Indexing at 2001-06-18 08:00:04 from J.H.M. Dassen Ray

Browse pgsql-general by date

	From	Date	Subject
Next Message	Randall Perry	2001-06-16 00:12:14	canned code to get db on web quickly via perl or PHP?
Previous Message	Doug McNaught	2001-06-15 23:33:42	Re: PDF Parsing and Indexing