Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

From: Sam Saffron <sam(dot)saffron(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
Date: 2014-11-14 03:31:39
Message-ID: CAAtdryN9=jp934tjVeC4uEf7-WSNNx81G_kzXMQkzpy9m3Yu0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Thank you so much!

So to recap the general way to reproduce this issue is:

create table products(id int primary key, type varchar);
insert into products select generate_series(1,10000), 'aaa';
insert into products select generate_series(10001,20000), 'bbb';
create index idx on products(type);
prepare stmt as select * from products where type = $1 limit 1;

Which quickly devolves into:

explain analyze execute stmt ('ccc');

QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.03 rows=1 width=8) (actual time=1.821..1.821
rows=0 loops=1)
-> Seq Scan on products (cost=0.00..339.00 rows=10000 width=8)
(actual time=1.819..1.819 rows=0 loops=1)
Filter: ((type)::text = $1)
Rows Removed by Filter: 20000
Total runtime: 1.843 ms
(5 rows)

So if I am searching for 'ccc' eventually the prepared plan
"optimises" and uses the better mechanism of just scanning the table
to find the first hit which is what the statistics suggest.

However a fairly common pattern I use it to check for "lack of
presence" of a value.

For example: if the product type 'ccc' is not in the table do this.
Unfortunately the optimiser deoptimises this class of operation.

I tried the exact example above with an "int" instead of a "varchar"
in the "type" column and was not able to reproduce the issue, I wonder
if there is some sort of different handling for strings vs numbers.

Unfortunately my actual table in play has a rather bad schema, the
"archetype" column really should be an int. That said we only have 2
general archetypes at the moment (private message and topic) and the
occasional single "banner" outlier, which may or may not be there. So
the data modelling is pretty hostile to performance. I have some ideas
on how to solve my particular problem but I do have some general
concerns.

Ruby on Rails is just about to ship a new version that heavily changes
the mechanics of query execution. For example, Product.where(name:
"foo").first will now result in a prepared query whereas in the past
it would just send the raw query. Overall this approach is better and
saner, but my general concern is that our API offers no "escape hatch"
for these outlier conditions. You can disable globally, but can not
just disable for a single call. I will raise this particular concern
to the team.

My second question/concern is that I feel I am totally
misunderstanding the changes to 'plancache.c', I thought that the
decision of the plan to use was purely based on the value sent in to
the prepared query. However it seems that the planner completely
ignores the value in some steps. (so, for example I was thinking that
"aaa" and "ccc" would result in completely different plans)

Thank you so much for your time, patience and general awesomeness

On Fri, Nov 14, 2014 at 11:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Sam Saffron <sam(dot)saffron(at)gmail(dot)com> writes:
>> I have hit a rather odd issue with prepared queries on both pg 9.3 and 9.4 beta.
>> I have this table (copy at http://samsaffron.com/testing.db.gz) with a
>> very odd performance profile:
>
> Interesting case. The issue seems to be that your statistics look like
> this:
>
> select * from pg_stats where tablename = 'topics' and attname = 'archetype';
> schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
> ------------+-----------+-----------+-----------+-----------+-----------+------------+---------------------------+---------------------+------------------+-------------+-------------------+------------------------+---------------------
> public | topics | archetype | f | 0 | 12 | 2 | {private_message,regular} | {0.604957,0.395043} | | 0.612985 | | |
> (1 row)
>
> That is, archetype consists of 60% 'private_message', 40% 'regular', and
> absolutely nothing else. So the condition archetype = 'banner' is very
> selective, and a plan built knowing that that is the parameter value will
> use the index:
>
> # explain select * from topics where archetype = 'banner' limit 1;
> QUERY PLAN
> ------------------------------------------------------------------------------
> Limit (cost=0.29..651.49 rows=1 width=520)
> -> Index Scan using idx11 on topics (cost=0.29..651.49 rows=1 width=520)
> Index Cond: ((archetype)::text = 'banner'::text)
> (3 rows)
>
> However, that's still a pretty darn expensive indexscan, mainly because
> archetype is not the leading key ... if you care about the performance
> of this query, why don't you have an index to match?
>
> # create index on topics(archetype);
> CREATE INDEX
> # explain select * from topics where archetype = 'banner' limit 1;
> QUERY PLAN
> -------------------------------------------------------------------------------------------
> Limit (cost=0.29..6.80 rows=1 width=520)
> -> Index Scan using topics_archetype_idx on topics (cost=0.29..6.80 rows=1 width=520)
> Index Cond: ((archetype)::text = 'banner'::text)
> (3 rows)
>
> However, just fixing the index availability actually makes the performance
> ratio even worse, because the prepared query still doesn't use the index:
>
> # explain execute foo('banner');
> QUERY PLAN
> ---------------------------------------------------------------------
> Limit (cost=0.00..0.11 rows=1 width=520)
> -> Seq Scan on topics (cost=0.00..1158.19 rows=10088 width=520)
> Filter: ((archetype)::text = $1)
> (3 rows)
>
> (Yes, you can get this result in psql, you just need to repeat the EXECUTE
> half a dozen times until it shifts to a generic plan.)
>
> The problem here is that without knowledge of the comparison value, the
> planner assumes that it will probably be one of the two values that make
> up the table content. (After all, why would you query for something
> else?) On that basis, a seqscan will probably hit a matching row in no
> time, and so (with the LIMIT 1) it looks like a better deal than the
> indexscan.
>
> We've talked about this type of problem before. Part of the issue is
> that costing of LIMIT doesn't apply any penalty for a bad worst-case
> scenario, and part of it is that the heuristics for choosing between
> custom and generic plans don't consider the possibility that the generic
> plan's estimated cost is way wrong for lack of knowledge of the comparison
> value. It's not real obvious how to improve either heuristic without
> probably making some cases worse.
>
> One thing that occurs to me is that if the generic plan estimate comes
> out much cheaper than the custom one, maybe we should assume that the
> generic's cost estimate is bogus. Right offhand I can't think of a reason
> for a custom plan to look worse than a generic one, unless there's a
> statistical quirk like this one.
>
> In the meantime, I assume that your real data contains a small percentage
> of values other than these two? If so, maybe cranking up the statistics
> target would help. If the planner knows that there are more than two
> values in the column, I think it would be less optimistic about assuming
> that the comparison value is one of the big two.
>
> regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Brilliantov Kirill Vladimirovich 2014-11-14 09:53:24 NEW in after insert trugger contained incorrect data
Previous Message Tom Lane 2014-11-14 01:31:44 Re: [HACKERS] Re: Performance issue with libpq prepared queries on 9.3 and 9.4

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-11-14 03:37:52 Re: pg_basebackup vs. Windows and tablespaces
Previous Message Steve Singer 2014-11-14 03:23:02 Re: logical decoding - reading a user catalog table