Re: Partition-wise join for join between (declaratively) partitioned tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-08-10 13:26:48
Message-ID: CA+Tgmob+cXfSJ_iUhQybRr3HjqF7YSWO4C5zVe621+enftjN0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2017 at 5:43 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Do you think we solving this problem is a prerequisite for
>> partition-wise join? Or should we propose that patch as a separate
>> enhancement?
>
> No, I'm not proposing anything yet. For now I just wanted to share
> this observation about where hot CPU time goes in simple tests, and
> since it turned out to be a loop in a loop that I could see an easy to
> way to fix for singleton sets and sets with a small range, I couldn't
> help trying it... But I'm still trying to understand the bigger
> picture. I'll be interested to compare profiles with the ordered
> append_rel_list version you have mentioned, to see how that moves the
> hot spots.

Perhaps this is stating the obvious, but it's often better to optimize
things like this at a higher level, rather than by tinkering with
stuff like Bitmapset. On the other hand, sometimes
micro-optimizations are the way to go, because optimizing
find_ec_member_for_tle(), for example, might involve a much broader
rethink of the planner code than we want to undertake right now.

> I guess one very practical question to ask is: can we plan queries
> with realistic numbers of partitioned tables and partitions in
> reasonable times? Well, it certainly looks very good for hundreds of
> partitions so far... My own experience of partitioning with other
> RDBMSs has been on that order, 'monthly partitions covering the past
> 10 years' and similar, but on the other hand it wouldn't be surprising
> to learn that people want to go to many thousands, especially for
> schemas which just keep adding partitions over time and don't want to
> drop them.

I've been thinking that it would be good if this feature - and other
new partitioning features - could scale to about 1000 partitions
without too much trouble. Eventually, it might be nice to scale
higher, but there's not much point in making partition-wise join scale
to 100,000 partitions if we've got some other part of the system that
runs into trouble beyond 250.

> Curious: would you consider joins between partitioned tables and
> non-partitioned tables where the join is pushed down to be a kind of
> "partition-wise join", or something else? If so, would that be a
> special case, or just the logical extreme case for
> 0014-WIP-Partition-wise-join-for-1-1-1-0-0-1-partition-ma.patch, where
> one single "partition" on the non-partitioned side maps to all the
> partitions on the partitioned size?

I think this is actually a really important case which we've just
excluded from the initial scope because the problem is hard enough
already. But it's quite possible that if you are joining partitioned
tables A and B with unpartitioned table X, the right join order could
be A-X-B; the A-X join might knock out a lot of rows. It's not great
to have to pick between doing the A-B join partitionwise and doing the
A-X join first; you want to get both things. But we can't do
everything at once.

Further down the road, there's more than one way of doing the A-X
join. You could join each partition of A to all of X, which is likely
optimal if for example you are doing a nested loop with an inner index
scan on X. But you could also partition X on the fly using A's
partitioning scheme and then join partitions of A against the
on-the-fly-partitioned version of X. That's likely to be a lot better
for a merge join with an underlying sort on X.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-10 14:17:14 Re: pl/perl extension fails on Windows
Previous Message Vladimir Rusinov 2017-08-10 13:17:59 Re: Funny WAL corruption issue