Re: Plan to support predicate push-down into subqueries with aggregates?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Adam Brusselback <adambrusselback(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Plan to support predicate push-down into subqueries with aggregates?
Date: 2016-03-08 23:00:58
Message-ID: CAKFQuwbfSUdCGPDsJJcd4_M15MiU4sUzNWNoPnA27+ZjZ=+fYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We bottom (or inline) post on these lists.

On Tue, Mar 8, 2016 at 3:37 PM, Adam Brusselback <adambrusselback(at)gmail(dot)com>
wrote:

> ​
> On Tue, Mar 8, 2016 at 5:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Adam Brusselback <adambrusselback(at)gmail(dot)com> writes:
>> > I was wondering if there were any plans to support predicate push-down
>> > optimization for subqueries (and views) with aggregates?
>>
>> Right offhand I would say that that's a complete mischaracterization
>> of the problem. I've not tried to run your test case, but you say
>>
>> > --Quick, is able to push down because it's a simple equality check
>> > SELECT *
>> > FROM header
>> > INNER JOIN header_total
>> > USING (header_id)
>> > WHERE header.header_id = 26;
>> >
>> > --Slow, no pushdown
>> > SELECT *
>> > FROM header
>> > INNER JOIN header_total
>> > USING (header_id)
>> > WHERE header.header_id < 200;
>> >
>> > --Slow, no pushdown
>> > SELECT *
>> > FROM header
>> > INNER JOIN header_total
>> > USING (header_id)
>> > WHERE header.description like '%5%';
>>
>> There's no preference for equalities over other kinds of predicates
>> as far as subquery pushdown is concerned. I think what your real
>> problem is is that in the first case, the system will derive the
>> additional condition "header_total.header_id = 26", while in the
>> second case it will not deduce "header_total.header_id < 200".
>> That's because the machinery for deducing such implied constraints
>> works only with equalities. That's not very likely to change anytime
>> soon, and even if it did, the inference would only extend to operators
>> that are members of the same btree family as the join equality operator.
>> Your example with a LIKE clause is always going to be out in the cold,
>> because there is no principled basis for the planner to decide that
>> "a = b" means that "a LIKE x" and "b LIKE x" will give the same result.
>> It hasn't got enough information about the behavior of LIKE to know
>> if that's safe or not. (It does, on the other hand, know very well that
>> SQL equality operators don't necessarily guarantee bitwise identity.)
>>
>> So I'd suggest just modifying your queries to write out both constraints
>> explicitly.
>>
>> ​
>> ​
>> Thanks Tom, appreciate the reply.
>>
>
> Sorry if I didn't call it the correct thing. I just know that with trying
> to encapsulate this aggregate logic in a view, I am unable to use that view
> in a query that I know is only going to touch a subset of the data without
> incurring a performance hit from the view doing seq scans on all of the
> rows in the detail_1 and detail_2 tables, and then throwing out 99% of the
> results when the filter is applied.
>
> I had initially started creating functions that would take an array of ids
> as a parameter, and manually push them down in the subqueries. That got
> really really messy though, and we moved away from doing that to having the
> aggregates eagerly materialized to a table with triggers.
>
>
> Are there any other options for making this type of query faster? It
> could be that I just am totally missing a better way to do this. I do
> really want to be able to contain that logic within a view of some sort
> though, as a bunch of other stuff is built on top of that. Having to push
> that aggregate query into all of those other queries would be hell.
>
>
​Maybe something like:

CREATE TYPE header_total_item (
amount numeric,
count integer
)​;

CREATE VIEW AS
SELECT header_id, header_amount
COALESCE(
(SELECT ROW(sum(rate * quantity), count(detail_1_id))::header_total_item
FROM detail_1 WHERE detail_1.header_id = header.header_id)
ROW(0.000,0)::header_total_item
) AS detail_1_total_item
FROM header;

​​Doesn't solve the "balance" column without going into a sub-query...which
I suspect is possible but don't have time to test.

I do understand the question as to why the view cannot be re-written as:

I don't have time to get into this deeper right now. The main question is
basically how can you force a parameterized nested join on the LEFT JOIN
sub-queries given that the system has equality joins between them and
header and you've supposedly provided a very selective predicate for the
rows being returned by header.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-03-08 23:18:59 Re: pg_upgrade error regarding hstore operator
Previous Message Adam Brusselback 2016-03-08 22:37:51 Re: Plan to support predicate push-down into subqueries with aggregates?