Re: Scanning all partition when more than 100 items in "where id in ()" clause

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Soni M <diptatapa(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres Bug <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Scanning all partition when more than 100 items in "where id in ()" clause
Date: 2018-07-27 02:23:53
Message-ID: 20180727022353.GJ27724@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Greetings,

* Soni M (diptatapa(at)gmail(dot)com) wrote:
> Hello All, I think I found bug here. Postgres 9.6.9 on Red Hat Enterprise
> Linux Server release 7.5 (Maipo).

[... the 99 item query ...]

> Planning time: 12.969 ms
> Execution time: 7.062 ms

[... the 100 item query ...]

> Planning time: 60.537 ms
> Execution time: 51.401 ms

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I think it's a consequence of the limitation in predtest.c that it won't
> try to do predicate proofs using ScalarArrayOps with more than 100 array
> entries:

This is far from the first time we (or, at least, I) have seen
complaints about that particular constant.

> /*
> * Proof attempts involving large arrays in ScalarArrayOpExpr nodes are
> * likely to require O(N^2) time, and more often than not fail anyway.
> * So we set an arbitrary limit on the number of array elements that
> * we will allow to be treated as an AND or OR clause.
> * XXX is it worth exposing this as a GUC knob?
> */
> #define MAX_SAOP_ARRAY_SIZE 100

Which certainly makes me think that comment in there might be worth
something- perhaps we should make this a GUC and let users see just what
would end up happening with a different choice. There could certainly
be cases where it'd be better to work it out.

> Not a bug, but a tradeoff. You'd be unhappy if the planner spent longer
> devising the plan than it saved to eliminate the extra partitions.

While I agree in concept, I'm awful curious how the "simpler" approach
used when we hit the limit resulted in a 5x increase in planning time.
Looks a bit like the extra time required to perform that elimination for
at least a few more items would be saving us cycles somewhere else that
are apparently pretty expensive.

Soni, any chance that this query was the first time all of those
partitions were hit in this backend, meaning there was a lot of time
required to load the relation information for them? What happens if you
prime the backend by running the 100-item case once, and then do 5 runs
of the 99-item and then 5 of the 100-item case?

Thanks!

Stephen

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-07-27 02:31:23 Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Previous Message Christophe Pettus 2018-07-27 00:42:14 Re: BUG #15300: "do you want the application "postgres" to accept incoming network connections" dialog box madness