Re: tsvector limitations

From: Tim <elatllat(at)gmail(dot)com>
To: Mark Johnson <mark(at)remingtondatabasesolutions(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: tsvector limitations
Date: 2011-06-14 01:58:54
Message-ID: BANLkTiniXCCAdwD0qDXb3mqLSSQrzqKSgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Mark,

That link is a mirror of this mailing list; it's not from 5 months ago.
If you are in the year 2012 please respond with lottery numbers and the
like.

On Mon, Jun 13, 2011 at 9:43 PM, Mark Johnson <
mark(at)remingtondatabasesolutions(dot)com> wrote:

>
>
> I found another post where you asked the same questions 5 months ago. Have
> you tested in that time?
> http://www.spinics.net/lists/pgsql-admin/msg19438.html
>
>
> A text search vector is an array of distinct lexemes (less any stopwords)
> and their positions. Taking your example we get ...
>
> select to_tsvector('the lord of the rings.txt') "answer";
> answer
> -------------------
> 'lord':2, 'rings.txt':5
>
> You can put the length() function around it to just get the number of
> lexemes. This is the size in terms of number of distinct lexemes, not size
> in terms of space utilization.
>
> select length(to_tsvector('the lord of the rings.txt')) "answer";
> answer
> --------
> 2
>
> You might find the tsvector data consumes 2x the space required by the
> input text. It will depend on your configuration and your input data. Test
> it and let us know what you find.
>
> -Mark
>
> -----Original Message-----
> *From:* Tim [mailto:elatllat(at)gmail(dot)com]
> *Sent:* Monday, June 13, 2011 03:19 PM
> *To:* pgsql-admin(at)postgresql(dot)org
> *Subject:* [ADMIN] tsvector limitations
>
> Dear list,
>
> How big of a file would one need to fill the 1MB limit of a tsvector?
> Reading
> http://www.postgresql.org/docs/9.0/static/textsearch-limitations.htmlseems to hint that filling a tsvector is improbable.
>
> Is there an easy way of query the bytes of a tsvector?
> something like length(tsvector) but bytes(tsvector).
>
> If there no easy method to query the bytes of a tsvector
> I realize the answer is highly dependent on the contents of the file, so
> here are 2 random examples:
> How many bytes of a tsvector would a 32MB ascii english unique word list
> make?
> How many bytes of a tsvector would something like "The Lord of the
> Rings.txt" make?
>
> If this limitation is ever hit is there a common practice for using more
> than one tsvector?
> Using a separate "one to many" table seems like an obvious solution piece,
> but I would not know how to detect or calculate how much text to give each
> tsvector.
> Assuming tsvectors can't be linked maybe they would need some overlap.
>
>
> Thanks in advance.
>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Kevin Grittner 2011-06-14 14:18:20 Re: tsvector limitations
Previous Message Mark Johnson 2011-06-14 01:43:37 Re: tsvector limitations