Re: [PATCH] Equivalence Class Filters

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Equivalence Class Filters
Date: 2015-12-08 01:26:55
Message-ID: CAKJS1f8RJLCM44+WZP_YdOrg3v2kusaMbb-Bn8AFYuk-9oePOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 December 2015 at 04:35, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 12/6/15 10:38 AM, Tom Lane wrote:
>
>> I said "in most cases". You can find example cases to support almost any
>> weird planner optimization no matter how expensive and single-purpose;
>> but that is the wrong way to think about it. What you have to think about
>> is average cases, and in particular, not putting a drag on planning time
>> in cases where no benefit ensues. We're not committing any patches that
>> give one uncommon case an 1100X speedup by penalizing every other query
>> 10%,
>> or even 1%; especially not when there may be other ways to fix it.
>>
>
> This is a problem that seriously hurts Postgres in data warehousing
> applications. We can't keep ignoring optimizations that provide even as
> little as 10% execution improvements for 10x worse planner performance,
> because in a warehouse it's next to impossible for planning time to matter.
>
> Obviously it'd be great if there was a fast, easy way to figure out
> whether a query would be expensive enough to go the whole 9 yards on
> planning it but at this point I suspect a simple GUC would be a big
> improvement.

I've certainly been here before [1], but the idea fell of deaf ears.

The biggest frustration for me is that the way Tom always seems to argue
his point it's as if planning time is roughly the same or more expensive
than execution time, and likely that's true in many cases, but I would
imagine in more normal cases that execution time is longer, although I've
never had the guts to stand up and argued this as I don't have any stats to
back me up.

I was talking to Thomas Munro yesterday about this, and he mentioned that
it would be quite nice to have some stats on how much time is spent in the
planner, vs executor. He came up with the idea of adding a column to
pg_stat_statements to record the planning time.

If we could get real statistics on real world cases and we discovered that,
for example on average that the total CPU time of planning was only 1% of
execution time, then it would certainly make adding 2% overhead onto the
planner a bit less of a worry, as this would just be %2 of 1% (0.02%). Such
information, if fed back into the community might be able to drive us in
the right direction when it comes to deciding what needs to be done to
resolve this constant issue with accepting planner improvement patches.

I believe that with parallel query on the horizon for 9.6 that we're now
aiming to support bigger OLAP type database than ever before. So if we
ignore patches like this one then it appears that we have some conflicting
goals in the community as it seems that we're willing to add the brawn, but
we're not willing to add the brain. If this is the case then it's a shame,
as I think we can have both. So I very much agree on the fact that we must
find a way to maintain support and high performance of small OLTP databases
too.

[1]
http://www.postgresql.org/message-id/CAApHDvrJrz-0xinyiqTiWs0mFX17GWD2Y8VZ+i92nuZsha8ocw@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-12-08 01:27:07 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Tom Lane 2015-12-07 23:57:26 Re: [sqlsmith] Failed to generate plan on lateral subqueries