Re: tsvector limitations

From: Tim <elatllat(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: tsvector limitations
Date: 2011-06-14 15:56:34
Message-ID: BANLkTi=4EaMpXu7orXiAHOiA2AS0w9k_Yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Kevin,

Thanks again for the reply.
I suspect casting and using octet_length() is not accurate.
Using "extract[ed] text" keyword or summaries would indeed be quick but is
not what I'm looking for.
I am inquiring about real-world numbers for full text search of large
documents, I'm not sure what more detail you could want.
I'm not demanding anything, just using examples to clarify my inquiry.
I am inded open to alternatives.

Thank you Kevin, pg_column_size looks like it's exactly what I'm looking
for.

http://www.postgresql.org/docs/9.0/static/functions-admin.html
pg_column_size(any) int Number of bytes used to store a particular value
(possibly compressed)

On Tue, Jun 14, 2011 at 11:36 AM, Kevin Grittner <
Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> Tim <elatllat(at)gmail(dot)com> wrote:
>
> > I would be surprised if there is no general "how big is this
> > object" method in PostgreSQL.
>
> You could cast to text and use octet_length().
>
> > If it's "bad design" to store large text documents (pdf,docx,etc)
> > as a BLOBs or on a filesystem and make them searchable with
> > tsvectors can you suggest a good design?
>
> Well, I suggested that storing a series of novels as a single entry
> seemed bad design to me. Perhaps one entry per novel or even finer
> granularity would make more sense in most applications, but there
> could be exceptions. Likewise, a list of distinct words is of
> dubious value in most applications' text searches. We extract text
> from court documents and store a tsvector for each document; we
> don't aggregate all court documents for a year and create a
> tsvector for that -- that would not be useful for us.
>
> > If making your own search implementation is "better" what is the
> > point of tsvectors?
>
> I remember you asking about doing that, but I don't think anyone
> else has advocated it.
>
> > Maybe I'm missing something here?
>
> If you were to ask for real-world numbers you'd probably get farther
> than demanding that people volunteer their time to perform tests
> that you define but don't seem willing to run. Or if you describe
> your use case in more detail, with questions about alternative
> approaches, you're likely to get useful advice.
>
> -Kevin
>

On Tue, Jun 14, 2011 at 11:44 AM, Kevin Grittner <
Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> > You could cast to text and use octet_length().
>
> Or perhaps you're looking for pg_column_size().
>
>
> http://www.postgresql.org/docs/9.0/interactive/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE
>
> -Kevin
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message sundaram 2011-06-14 19:13:56 Re: psql shell return codes - checking if database exists
Previous Message Kevin Grittner 2011-06-14 15:44:14 Re: tsvector limitations