Skip site navigation (1) Skip section navigation (2)

Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Saffron <sam(dot)saffron(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Date: 2014-11-14 00:34:39
Message-ID: 26882.1415925279@sss.pgh.pa.us (view raw or whole thread)
Thread:
Lists: pgsql-generalpgsql-hackers
Sam Saffron <sam(dot)saffron(at)gmail(dot)com> writes:
> I have hit a rather odd issue with prepared queries on both pg 9.3 and 9.4 beta.
> I have this table (copy at http://samsaffron.com/testing.db.gz) with a
> very odd performance profile:

Interesting case.  The issue seems to be that your statistics look like
this:

 select * from pg_stats where tablename = 'topics' and attname = 'archetype';
 schemaname | tablename |  attname  | inherited | null_frac | avg_width | n_distinct |     most_common_vals      |  most_common_freqs  | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+-----------+-----------+-----------+-----------+-----------+------------+---------------------------+---------------------+------------------+-------------+-------------------+------------------------+---------------------
 public     | topics    | archetype | f         |         0 |        12 |          2 | {private_message,regular} | {0.604957,0.395043} |                  |    0.612985 |                   |                        | 
(1 row)

That is, archetype consists of 60% 'private_message', 40% 'regular', and
absolutely nothing else.  So the condition archetype = 'banner' is very
selective, and a plan built knowing that that is the parameter value will
use the index:

# explain select * from topics where archetype = 'banner' limit 1;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Limit  (cost=0.29..651.49 rows=1 width=520)
   ->  Index Scan using idx11 on topics  (cost=0.29..651.49 rows=1 width=520)
         Index Cond: ((archetype)::text = 'banner'::text)
(3 rows)

However, that's still a pretty darn expensive indexscan, mainly because
archetype is not the leading key ... if you care about the performance
of this query, why don't you have an index to match?

# create index on topics(archetype);
CREATE INDEX
# explain select * from topics where archetype = 'banner' limit 1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Limit  (cost=0.29..6.80 rows=1 width=520)
   ->  Index Scan using topics_archetype_idx on topics  (cost=0.29..6.80 rows=1 width=520)
         Index Cond: ((archetype)::text = 'banner'::text)
(3 rows)

However, just fixing the index availability actually makes the performance
ratio even worse, because the prepared query still doesn't use the index:

# explain execute foo('banner');
                             QUERY PLAN                              
---------------------------------------------------------------------
 Limit  (cost=0.00..0.11 rows=1 width=520)
   ->  Seq Scan on topics  (cost=0.00..1158.19 rows=10088 width=520)
         Filter: ((archetype)::text = $1)
(3 rows)

(Yes, you can get this result in psql, you just need to repeat the EXECUTE
half a dozen times until it shifts to a generic plan.)

The problem here is that without knowledge of the comparison value, the
planner assumes that it will probably be one of the two values that make
up the table content.  (After all, why would you query for something
else?)  On that basis, a seqscan will probably hit a matching row in no
time, and so (with the LIMIT 1) it looks like a better deal than the
indexscan.

We've talked about this type of problem before.  Part of the issue is
that costing of LIMIT doesn't apply any penalty for a bad worst-case
scenario, and part of it is that the heuristics for choosing between
custom and generic plans don't consider the possibility that the generic
plan's estimated cost is way wrong for lack of knowledge of the comparison
value.  It's not real obvious how to improve either heuristic without
probably making some cases worse.

One thing that occurs to me is that if the generic plan estimate comes
out much cheaper than the custom one, maybe we should assume that the
generic's cost estimate is bogus.  Right offhand I can't think of a reason
for a custom plan to look worse than a generic one, unless there's a
statistical quirk like this one.

In the meantime, I assume that your real data contains a small percentage
of values other than these two?  If so, maybe cranking up the statistics
target would help.  If the planner knows that there are more than two
values in the column, I think it would be less optimistic about assuming
that the comparison value is one of the big two.

			regards, tom lane


In response to

Responses

pgsql-hackers by date

Next:From: David G JohnstonDate: 2014-11-14 00:43:45
Subject: Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Previous:From: Peter EisentrautDate: 2014-11-14 00:30:29
Subject: Re: Teaching pg_dump to use NOT VALID constraints

pgsql-general by date

Next:From: David G JohnstonDate: 2014-11-14 00:43:45
Subject: Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Previous:From: Sam SaffronDate: 2014-11-13 23:23:21
Subject: Performance issue with libpq prepared queries on 9.3 and 9.4

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group