Performance regression between 8.3 and 8.4 on heavy text indexing

From: gael(at)pilotsystems(dot)net ( Gaël Le Mignot)
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance regression between 8.3 and 8.4 on heavy text indexing
Date: 2009-08-21 13:37:35
Message-ID: plop871vn5b84g.fsf@aoskar.kilobug.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hello,

We are using PostgreSQL to index a huge collection (570 000) of articles for a french daily newspaper (Libération). We use massively the full text search feature. I attach to this mail the schema of the database we use.

Overall, we have very interesting performances, except in a few cases, when combining a full text match with a lot of matches with a date order and a limit (which is a very common use case, asking for the 50 more recent articles speaking about a famous person, for example).

The reason of this mail is what we noticed a performance drop from PostgreSQL 8.3 to PostgreSQL 8.4.

In order to try to locate the performance cost, I changed a few settings in 8.4 to have the same values than in 8.3 (and rerun analyze after) ::

cursor_tuple_fraction = 1.0
default_statistics_target = 10

We the modified settings, the peformance drop is much lower, but still
present. Here are the statistics on replaying sequentially a bunch of
real-life queries to the two versions of the database :

With 8.3 ::

7334 queries, average time is 0.20 s
6 queries above 20.00 s (0.08 %)
20 queries above 10.00 s (0.27 %)
116 queries above 2.00 s (1.58 %)
top ten: 15.09 15.15 15.19 16.60 20.40 63.05 67.89 78.21 90.30 97.56

With 8.4 ::

7334 queries, average time is 0.23 s
12 queries above 20.00 s (0.16 %)
24 queries above 10.00 s (0.33 %)
112 queries above 2.00 s (1.53 %)
top ten: 31.76 31.94 32.63 47.21 48.80 63.50 79.57 83.36 96.44 113.61

Here is an example query that is significantly slower in 8.4 (31.76 seconds) than in 8.3 (10.52 seconds) ::

SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50

And the explain on it :

With 8.3 ::

Limit (cost=752.67..752.67 rows=1 width=24)
-> Sort (cost=752.67..752.67 rows=1 width=24)
Sort Key: publicationdate, pagenumber
-> Bitmap Heap Scan on libeindex (cost=748.64..752.66 rows=1 width=24)
Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery))
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
-> BitmapAnd (cost=748.64..748.64 rows=1 width=0)
-> Bitmap Index Scan on keywords_index (cost=0.00..48.97 rows=574 width=0)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
-> Bitmap Index Scan on fulltext_index (cost=0.00..699.42 rows=574 width=0)
Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
(11 rows)

With 8.4 ::

Limit (cost=758.51..758.51 rows=1 width=24)
-> Sort (cost=758.51..758.51 rows=1 width=24)
Sort Key: publicationdate, pagenumber
-> Bitmap Heap Scan on libeindex (cost=14.03..758.50 rows=1 width=24)
Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) AND ((classname)::text = 'article'::text))
-> Bitmap Index Scan on keywords_index (cost=0.00..14.03 rows=192 width=0)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
(8 rows)

More informations on the setup :

- postgresql 8.3.7 from Debian Lenny ;

- postgresql 8.4.0 from Debian Lenny backports ;

- rurnning in a Xen virtual machine, using 64-bits kernel ;

- 2 cores of a 2GHz Core2Quad and 2Gb of RAM dedicated to the VM.

If you need additional informations, we'll gladly provide them. If you have any tips or advises so we could make the 8.4 behave as least as good as the 8.3 it would be very nice.

Hoping this can help you to improve this great software.

Regards,

--
Gaël Le Mignot - gael(at)pilotsystems(dot)net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-08-21 20:04:23 Re: number of rows estimation for bit-AND operation
Previous Message Slava Moudry 2009-08-21 07:04:47 Re: number of rows estimation for bit-AND operation