Re: tsvector limitations

From: Greg Williamson <gwilliamson39(at)yahoo(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tim <elatllat(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: tsvector limitations
Date: 2011-06-15 05:56:26
Message-ID: 453674.41390.qm@web46113.mail.sp1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Kevin Grittner wrote:

> Tim <elatllat(at)gmail(dot)com> wrote:
>
<...>
> Your test (whatever data it is that you used) don't seem typical of
> English text. The entire PostgreSQL documentation in HTML form,
> when all the html files are concatenated is 11424165 bytes (11MB),
> and the tsvector of that is 364410 (356KB). I don't suppose you
> know of some publicly available file on the web that I could use to
> reproduce your problem?

Try trolling texts at the Internet Archive (archive.org) -- lots of stuff that
has been rendered into ASCII ... Government documents and the like from all
periods; novels and the like that are no longer under copyright, so lots of long
classics.

<http://www.archive.org/stream/ataleoftwocities00098gut/old/2city12p_djvu.txt>
for example ... 765K

HTH,

Greg Williamson

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Campbell, Lance 2011-06-15 13:58:48 Logging issue
Previous Message Tim 2011-06-15 00:10:58 Re: tsvector limitations