Re: Performance regression between 8.3 and 8.4 on heavy text indexing

From: gael(at)pilotsystems(dot)net ( Gaël Le Mignot)
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gaël Le Mignot <gael(at)pilotsystems(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Date: 2009-08-27 20:03:10
Message-ID: plop87ocq1rpmp.fsf@aoskar.kilobug.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Guillaume!

Wed, 26 Aug 2009 23:59:25 +0200, you wrote:

> On Wed, Aug 26, 2009 at 6:29 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> gael(at)pilotsystems(dot)net (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
>>> So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
>>
>> Yup.  What's even more interesting is that it seems the real win would
>> have been to use just the 'claude & duviau' condition (which apparently
>> matched only 14 rows).  8.3 had no hope whatever of understanding that,
>> it just got lucky.  8.4 should have figured it out, I'm thinking.
>> Does it help if you increase the statistics target for fulltext_tsv?
>> (Don't forget to re-ANALYZE after doing so.)

> It could be interesting to run the query without the condition
> (keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
> (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) in 8.4.

Here it is ::

libebench=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau'))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=12264.98..12265.11 rows=50 width=24) (actual time=3.799..3.825 rows=10 loops=1)
-> Sort (cost=12264.98..12271.03 rows=2421 width=24) (actual time=3.794..3.802 rows=10 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=2363.10..12184.56 rows=2421 width=24) (actual time=3.579..3.693 rows=10 loops=1)
Recheck Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
-> Bitmap Index Scan on fulltext_index (cost=0.00..2362.49 rows=2877 width=0) (actual time=3.499..3.499 rows=14 loops=1)
Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Total runtime: 166.772 ms
(10 rows)

So it estimates 2877 rows for that, while in reality it's 14.

> Btw, what Tom means by increasing the statistics is executing the
> following queries:
> ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;

Ok, I did it for 500 also on the keywords_tsv column, which was the
other contestor. Here we have a clear improvement: the search in
keyword_tsv is now estimated at 10398 (real being 10222) and the one
on fulltext_tsv at 1 (real being 14).

I did it at 1000 too, it's almost the same result.

By re-running our sampling of 7334 queries on the database with the
statistics at 1000 on both fulltext_tsv and keywords_tsv, we do have
overall better results than with 8.3 ! So a greeat thanks to everyone.

The weird thing was that with the default of 100 for statistics
target, it was worse than when we moved back to 10. So I didn't try
with 1000, but I should have.

I'll do more tests and keep the list informed if it can be of any
help.

--
Gaël Le Mignot - gael(at)pilotsystems(dot)net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2009-08-27 22:20:34 Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Previous Message Tom Lane 2009-08-27 18:05:13 Re: PostgreSQL does CAST implicitely between int and a domain derived from int