Re: Memory consumed by paths during partitionwise join planning

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory consumed by paths during partitionwise join planning
Date: 2023-12-08 05:02:08
Message-ID: CAExHW5skDmFNV9ZNs-EpC7jf-HciZ9v=rZpq+c1v+Z9i9RqqVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 7, 2023 at 6:19 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 1 Dec 2023 at 19:59, Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > I am fine to work on this further if the community thinks it is
> > acceptable. It seems from your point of view the worth of this work is
> > as follows
> > a. memory savings - not worth it
> > b. getting rid of !IsA(old_path, IndexPath) - worth it
> > c. problem of same path being added to multiple relations - not worth
> > it since you have other solution
> >
> > Can't judge whether that means it's acceptable or not.
>
> I think it's worth trying to get this working and we can run some
> performance tests to see if it's cheap enough to use.
>
> I've not had enough time to fully process your patches, but on a quick
> look, I don't quite understand why link_path() does not have to
> recursively increment ref_counts in each of its subpaths. It seems
> you're doing the opposite in free_path(), so it seems like this would
> break for cases like a SortPath on say, a LimitPath over a SeqScan.
> When you call create_sort_path you'll only increment the refcount on
> the LimitPath. The SeqScan path then has a refcount 1 lower than it
> should. If something calls unlink_path on the SeqScan path that could
> put the refcount to 0 and break the SortPath by freeing a Path
> indirectly referenced by the sort.

Let me explain how linking and unlinking works. You may skip over it
if you are comfortable with this logic already.
refcounts count how many other paths or objects are referencing a
given path. E.g. we have three path chains as follows
1. joinpath_1->joinpath_2->seqpath_1,
2. joinpath_3->joinpath_4->seqpath_1,
3. joinpath_5->joinpath_2->seqpath_1

Please note that this is not full path tree/forest. It is difficult to
describe the whole path forest in text format. But this should be
sufficient to get the gist of how linking and unlinking works.

Let's say all these paths are listed in pathlists of respective
RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all
part of the topmost RelOptInfo. Then the refcounts will be as follows
joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo)
joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo)
joinpath_4 - 2 (joinpath_3, RelOptInfo)
seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)

If joinpath_3 is chosen finally to be converted into a plan,
joinpath_1, joinpath_5 will be unlinked, their refcount drops to 0 and
they will be freed.
unlinking and freeing them reduces refcount of joinpath 2 to 1. When
we will cleanse the pathlist of corresponding RelOptInfo, joinpath_2's
refcount will reduce to 0 and it will be freed.
freeing joinpath_2 will reduce refcount of seqpath to 2 (joinpath_4
and RelOptInfo).
joinpath_3 and joinpath_4 will have their refcounts unchanged.

This will leave the paths joinpath_3, joinpath_4 and seqpath_1 to be
converted to plans. Rest of the paths will be freed.

If we increment the refcount all the way down the path forest, they
won't really be refcounts anymore since they will count indirect
references as well. Then unlink_path also has to traverse the entire
tree resetting refcounts. Both of those are unnecessary. Please note
that it's only free_path which calls unlink_path on the referenced
paths not unlink_path itself.

In your example, the path chain looks like this
sort_path->limit_path->seqpath with refcounts 1, 1, 2 respectively. I
am assuming that seqpath is referenced by pathlist of its RelOptInfo
and limitpath. sortpath is referenced by its RelOptInfo or is chosen
as the final path. limitpath is not referenced in a RelOptInfo but is
referenced in sortpath.

Now the only things that can call unlink_path on seqpath are free_path
on limitpath OR pathlist cleaning of its RelOptInfo. In both the cases
the refcount will correctly report the number of objects referencing
it. So I don't see a problem here. If you can provide me a query that
does this, I will test it.

> Recursively incrementing the refcounts would slow the patch down a
> bit, so we'd need to test the performance of this. I think at the
> rate we create and free join paths in complex join problems we could
> end up bumping refcounts quite often.

This won't be necessary as explained above.

>
> Another way to fix the pfree issues was to add infrastructure to
> shallow copy paths. We could shallow copy paths whenever we borrow a
> Path from another RelOptInfo and then just reset the parent to the new
> RelOptInfo. That unfortunately makes your memory issue a bit worse.
> We discussed this a bit in [1]. It also seems there was some
> discussion about wrapping a Path up in a ProjectionPath in there. That
> unfortunately also takes us in the opposite direction in terms of your
> memory reduction goals.

Yeah.

>
> Maybe we can try to move forward with your refcount idea and see how
> the performance looks. If that's intolerable then that might help us
> decide on the next best alternative solution.

I will report performance numbers (planning time) with my current
patchset which has limitation listed in [1]. If performance gets
worse, we will abandon this idea. If not, I will work on completing
the patch. Does that work for you?

[1] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2023-12-08 05:10:29 Re: [PATCH] New [relation] option engine
Previous Message John Naylor 2023-12-08 04:51:15 Re: micro-optimizing json.c