Re: Creating foreign key on partitioned table is too slow

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "kato-sho(at)fujitsu(dot)com" <kato-sho(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Creating foreign key on partitioned table is too slow
Date: 2019-11-01 08:37:58
Message-ID: CA+HiwqFucUh7hYkfZ6x1MVcs_R24eUfNVuRwdE_FwuwK8XpSZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On Fri, Oct 25, 2019 at 7:18 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On Thu, Oct 24, 2019 at 03:48:57PM -0300, Alvaro Herrera wrote:
> >On 2019-Oct-23, kato-sho(at)fujitsu(dot)com wrote:
> >
> >> Hello
> >>
> >> To benchmark with tpcb model, I tried to create a foreign key in the partitioned history table, but backend process killed by OOM.
> >> the number of partitions is 8192. I tried in master(commit: ad4b7aeb84).
> >>
> >> I did the same thing in another server which has 200GB memory, but creating foreign key did not end in 24 hours.
> >
> >Thanks for reporting.

Thank you Kato-san.

> It sounds like there must be a memory leak here.
> >I am fairly pressed for time at present so I won't be able to
> >investigate this until, at least, mid November.
>
> I've briefly looked into this, and I think the main memory leak is in
> RelationBuildPartitionDesc. It gets called with PortalContext, it
> allocates a lot of memory building the descriptor, copies it into
> CacheContext but does not even try to free anything. So we end up with
> something like this:
...
> The attached patch trivially fixes that by adding a memory context
> tracking all the temporary data, and then just deletes it as a whole at
> the end of the function. This significantly reduces the memory usage for
> me, not sure it's 100% correct.

Thank you Tomas. I think we have considered this temporary context
fix a number of times before, but it got stalled for one reason or
another ([1] comes to mind as the last thread where this came up).

Another angle to look at this is that our design where PartitionDesc
is rebuilt on relcache reload of the parent relation is not a great
one after all. It seems that we're rightly (?) invalidating the
parent's relcache 8192 times in this case, because its cacheable
foreign key descriptor changes on processing each partition, but
PartitionDesc itself doesn't change. Having to pointlessly rebuild it
8192 times seems really wasteful.

I recall a discussion where it was proposed to build PartitionDesc
only when needed as opposed on every relcache reload of the parent
relation. Attached PoC-at-best patch that does that seems to go
through without OOM and passes make check-world. I think this should
have a very minor impact on select queries.

But...

> FWIW, even with this fix it still takes an awful lot to create the
> foreign key, because the CPU is stuck doing this
>
> 60.78% 60.78% postgres postgres [.] bms_equal
> 32.58% 32.58% postgres postgres [.] get_eclass_for_sort_expr
> 3.83% 3.83% postgres postgres [.] add_child_rel_equivalences
> 0.23% 0.00% postgres [unknown] [.] 0x0000000000000005
> 0.22% 0.00% postgres [unknown] [.] 0000000000000000
> 0.18% 0.18% postgres postgres [.] AllocSetCheck

...we have many problems to solve here. :-(

Thanks,
Amit

[1] https://www.postgresql.org/message-id/CA%2BTgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw%40mail.gmail.com

Attachment Content-Type Size
build-PartitionDesc-when-needed-PoC.patch application/octet-stream 3.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-11-01 08:51:06 Re: [HACKERS] Block level parallel vacuum
Previous Message Peter Eisentraut 2019-11-01 07:07:16 Refactor parse analysis of EXECUTE command