Re: why doesn't optimizer can pull up where a > ( ... )

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, xuncheng(at)google(dot)com, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why doesn't optimizer can pull up where a > ( ... )
Date: 2019-11-21 15:57:22
Message-ID: CAKU4AWrkhuiObaAX-2gjYJ6b0_bxFS8r6Wgf2V+1SZ7P1s-cPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 21, 2019 at 6:12 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Thu, Nov 21, 2019 at 08:30:51AM +0800, Andy Fan wrote:
> >>
> >>
> >> Hm. That actually raises the stakes a great deal, because if that's
> >> what you're expecting, it would require planning out both the
> transformed
> >> and untransformed versions of the query before you could make a cost
> >> comparison.
> >
> >
> >I don't know an official name, let's call it as "bloom filter push down
> >(BFPD)" for reference. this algorithm may be helpful on this case with
> >some extra effort.
> >
> >First, Take . "select ... from t1, t2 where t1.a = t2.a and t1.b = 100"
> >for example, and assume t1 is scanned before t2 scanning, like hash
> >join/sort merge and take t1's result as inner table.
> >
> >1. it first scan t1 with filter t1.b = 100;
> >2. during the above scan, it build a bloom filter *based on the join key
> >(t1.a) for the "selected" rows.*
> >3. during scan t2.a, it filters t2.a with the bloom filter.
> >4. probe the the hash table with the filtered rows from the above step.
> >
>
> So essentially just a hash join with a bloom filter?

Yes, the idea is exactly same but we treat the value differently (both are
valid, and your point is more common) . In my opinion in some
environment like oracle exadata, it is much more powerful since it
transfers much less data from data node to compute node.

Of course, the benefit is not always, but it is a good beginning to make
it smarter.

> That doesn't seem very relevant to this thread (at least I don't see any
> obvious link),
>

The original problem "group by p_brand" for "all the rows" maybe not a
good idea all the time, and if we can do some filter before the group
by, the result would be better.

And in some
> cases building a bloom filter did result in nice speedups, but in other
> cases it was just an extra overhead. But it does not require change of
> plan shape, unlike the optimization discussed here.
>

I thought we could add a step named "build the filter" and another step as
"apply the filter". If so, the plan shape is changed. anyway I don't
think this is a key point.

>
> Ultimately there were discussions about pushing the bloom filter much
> deeper on the non-hash side, but that was never implemented.

Do you still have any plan about this feature since I see you raised the
idea and and the idea was very welcomed also?

>Back to this problem, if we have a chance to get the p_brand we are
> >interested, we can use the same logic to only group by the p_brand.
> >
> >Another option may be we just keep the N versions, and search them
> >differently and compare their cost at last.
> >
>
> Maybe. I think the problem is going to be that with multiple such
> correlated queries you may significantly increase the number of plan
> variants to consider - each subquery may be transformed or not, so the
> space splits into 2. With 6 such subqueries you suddenly have 64x the
> number of plan variants you have to consider (I don't think you can just
> elimiate those early on).
>
> >> The Greenplum page mentions they also added "join-aggregates
> >reordering", in addition to subquery unnesting.
> >Thanks, I will search more about this.
> >
> >>Having said that, the best form of criticism is a patch. If somebody
> >>actually wrote the code to do something like this, we could look at how
> >>much time it wasted in which unsuccessful cases and then have an
> >>informed discussion about whether it was worth adopting.
> >>
> >
> >I would try to see how far I can get.
>
> +1
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2019-11-21 16:28:00 Re: SQL/JSON: JSON_TABLE
Previous Message Tom Lane 2019-11-21 15:35:25 Re: an OID >= 8000 in master