From: | Tim <elatllat(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: tsvector limitations |
Date: | 2011-06-14 15:56:34 |
Message-ID: | BANLkTi=4EaMpXu7orXiAHOiA2AS0w9k_Yg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi Kevin,
Thanks again for the reply.
I suspect casting and using octet_length() is not accurate.
Using "extract[ed] text" keyword or summaries would indeed be quick but is
not what I'm looking for.
I am inquiring about real-world numbers for full text search of large
documents, I'm not sure what more detail you could want.
I'm not demanding anything, just using examples to clarify my inquiry.
I am inded open to alternatives.
Thank you Kevin, pg_column_size looks like it's exactly what I'm looking
for.
http://www.postgresql.org/docs/9.0/static/functions-admin.html
pg_column_size(any) int Number of bytes used to store a particular value
(possibly compressed)
On Tue, Jun 14, 2011 at 11:36 AM, Kevin Grittner <
Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Tim <elatllat(at)gmail(dot)com> wrote:
>
> > I would be surprised if there is no general "how big is this
> > object" method in PostgreSQL.
>
> You could cast to text and use octet_length().
>
> > If it's "bad design" to store large text documents (pdf,docx,etc)
> > as a BLOBs or on a filesystem and make them searchable with
> > tsvectors can you suggest a good design?
>
> Well, I suggested that storing a series of novels as a single entry
> seemed bad design to me. Perhaps one entry per novel or even finer
> granularity would make more sense in most applications, but there
> could be exceptions. Likewise, a list of distinct words is of
> dubious value in most applications' text searches. We extract text
> from court documents and store a tsvector for each document; we
> don't aggregate all court documents for a year and create a
> tsvector for that -- that would not be useful for us.
>
> > If making your own search implementation is "better" what is the
> > point of tsvectors?
>
> I remember you asking about doing that, but I don't think anyone
> else has advocated it.
>
> > Maybe I'm missing something here?
>
> If you were to ask for real-world numbers you'd probably get farther
> than demanding that people volunteer their time to perform tests
> that you define but don't seem willing to run. Or if you describe
> your use case in more detail, with questions about alternative
> approaches, you're likely to get useful advice.
>
> -Kevin
>
On Tue, Jun 14, 2011 at 11:44 AM, Kevin Grittner <
Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> > You could cast to text and use octet_length().
>
> Or perhaps you're looking for pg_column_size().
>
>
> http://www.postgresql.org/docs/9.0/interactive/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE
>
> -Kevin
>
From | Date | Subject | |
---|---|---|---|
Next Message | sundaram | 2011-06-14 19:13:56 | Re: psql shell return codes - checking if database exists |
Previous Message | Kevin Grittner | 2011-06-14 15:44:14 | Re: tsvector limitations |