Re: tsvector pg_stats seems quite a bit off.

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector pg_stats seems quite a bit off.
Date: 2010-05-30 18:02:05
Message-ID: 4C02A81D.30209@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/05/10 09:08, Jesper Krogh wrote:
> On 2010-05-29 15:56, Jan Urbański wrote:
>> On 29/05/10 12:34, Jesper Krogh wrote:
>>> I can "fairly easy" try out patches or do other kind of testing.
>>>
>> I'll try to come up with a patch for you to try and fiddle with these
>> values before Monday.

Here's a patch against recent git, but should apply to 8.4 sources as
well. It would be interesting to measure the memory and time needed to
analyse the table after applying it, because we will be now using a lot
bigger bucket size and I haven't done any performance impact testing on
it. I updated the initial comment block in compute_tsvector_stats, but
the prose could probably be improved.

> testdb=# explain select id from testdb.reference where document_tsvector
> @@ plainto_tsquery('where') order by id limit 50;
> NOTICE: text-search query contains only stop words or doesn't contain
> lexemes, ignored

That's orthogonal to the issue with the statistics collection, you just
need to modify your stopwords list (for instance make it empty).

Cheers,
Jan

Attachment Content-Type Size
ts-typanalyze-fix.patch text/x-patch 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2010-05-30 19:43:43 Re: Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
Previous Message Tom Lane 2010-05-30 17:51:16 Re: pg_trgm