Re: Performance regression between 8.3 and 8.4 on heavy text indexing

From: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gaël Le Mignot <gael(at)pilotsystems(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Date: 2009-08-26 21:59:25
Message-ID: 1d4e0c10908261459s33ac5c02vd4494ed1dd20d390@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Aug 26, 2009 at 6:29 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> gael(at)pilotsystems(dot)net (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
>> So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
>
> Yup.  What's even more interesting is that it seems the real win would
> have been to use just the 'claude & duviau' condition (which apparently
> matched only 14 rows).  8.3 had no hope whatever of understanding that,
> it just got lucky.  8.4 should have figured it out, I'm thinking.
> Does it help if you increase the statistics target for fulltext_tsv?
> (Don't forget to re-ANALYZE after doing so.)

It could be interesting to run the query without the condition
(keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
(fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) in 8.4.

Btw, what Tom means by increasing the statistics is executing the
following queries:
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;
ANALYZE;
run your query with EXPLAIN ANALYZE;
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 1000;
ANALYZE;
run your query with EXPLAIN ANALYZE;
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 5000;
ANALYZE;
run your query with EXPLAIN ANALYZE;

to see if it improves the estimates.

--
Guillaume

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2009-08-26 23:03:07 Re: PostgreSQL does CAST implicitely between int and a domain derived from int
Previous Message Greg Stark 2009-08-26 20:42:35 Re: Performance issues with large amounts of time-series data