Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-03-16 10:48:54
Message-ID: CAFjFpRfS9fW1+9vcbqyPq0WcqxoGN1uPUHGGq_x2eTUorOHu7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2017 at 12:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Mar 15, 2017 at 8:49 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> Of course, that supposes that 0009 can manage to postpone creating
>>> non-sampled child joinrels until create_partition_join_plan(), which
>>> it currently doesn't.
>>
>> Right. We need the child-join's RelOptInfos to estimate sizes, so that
>> we could sample the largest ones. So postponing it looks difficult.
>
> You have a point.
>
>>> In fact, unless I'm missing something, 0009
>>> hasn't been even slightly adapted to take advantage of the
>>> infrastructure in 0001; it doesn't seem to reset the path_cxt or
>>> anything. That seems like a fairly major omission.
>>
>> The path_cxt reset introduced by 0001 recycles memory used by all the
>> paths, including paths created for the children. But that happens only
>> after all the planning has completed. I thought that's what we
>> discussed to be done. We could create a separate path context for
>> every top-level child-join.
>
> I don't think we need to create a new context for each top-level
> child-join, but I think we should create a context to be used across
> all top-level child-joins and then reset it after planning each one.

Sorry, that's what I meant by creating a new context for each
top-level child-join. So, we need to copy the required path tree
before resetting the context. I am fine doing that but read on.

> I thought the whole point here was that NOT doing that caused the
> memory usage for partitionwise join to get out of control. Am I
> confused?

We took a few steps to reduce the memory footprint of partition-wise
join in [1] and [2]. According to the numbers reported in [1] and then
in [2], if the total memory consumed by a planner is 44MB (memory
consumed by paths 150K) for a 5-way non-parition-wise join between
tables with 1000 partitions, partition-wise join consumed 192MB which
is 4.4 times the non-partitino-wise case. The earlier implementation
of blowing away a memory context after each top-level child-join, just
got rid of the paths created for that child-join. The total memory
consumed by paths created for all the child-joins was about 150MB.
Remember that we can not get rid of memory consumed by expressions,
RelOptInfos, RestrictInfos etc. since their pointers will be copied
into the plan nodes.

With changes in 0001, what happens is we accumulate 150MB till the end
of the planning and get rid of it after we have created a plan. So,
till the plan is created we are consuming approx. 192MB + 150MB =
342MB memory and are getting rid of 150MB memory after we have created
the plan. I am not sure whether consuming extra 150MB or for that
matter 342MB in a setup with a thousand partitions is "going out of
control". (342MB is approx. 7.8 time 44MB; not 1000 times, and not
even 10 times). But if you think that we should throw away unused
paths after planning every top-level child-join I am fine with it.

[1] https://www.postgresql.org/message-id/CAFjFpRcZ_M3-JxoiDkdoPS%2B-9Cok4ux9Si%2B4drcRL-62af%3DjWw@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFjFpRe66z%2Bw9%2BdnAkWGiaB1CU2CUQsLGsqzHzYBoA%3DKJFf%2BPQ%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-03-16 10:51:14 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Amit Khandekar 2017-03-16 10:27:44 Re: Parallel Append implementation