Re: Commitfest 2023-03 starting tomorrow!

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Commitfest 2023-03 starting tomorrow!
Date: 2023-04-21 16:49:40
Message-ID: CAAKRu_aoh7LvULhcH0WGxN_YjMbiy6K0=5e3HqsEFbeivt5k4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 21, 2023 at 9:50 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> writes:
> > After catching up with this thread, where pending bugs are listed and discussed,
> > I wonder if the current patches trying to lower the HashJoin memory explosion[1]
> > could be added to the "Older bugs affecting stable branches" list of
> > https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items as I think they
> > deserve some discussion/triage for v16?
>
> They do not. That patch is clearly nowhere near ready to commit, and
> even if it was, I don't think we'd consider it post-feature-freeze.
> Any improvement in this space would be a feature, not a bug fix,
> despite anyone's attempts to label it a bug fix.

So, I think this may be a bit harsh. The second patch in the set only
moves hash join batch buffile creation into a more granular memory
context to make it easier to identify instances of this bug (which
causes OOMs). It is missing a parallel hash join implementation and a
bit more review. But it is not changing any behavior.

If using a separate memory context solely for the purpose of accounting
is considered an anti-pattern, we could use some arithmetic like
hash_agg_update_metrics() to calculate how much space is taken up by
these temporary file buffers. Ultimately, either method is a relatively
small change (both LOC and impact AFAICT).

Currently, it isn't possible for a user to understand what is consuming
so much memory when hash join batch file buffers substantially exceed
the size of the actual hashtable. This memory usage is not displayed in
EXPLAIN ANALYZE or anywhere else. I think adding a debugging message
with some advice for is a reasonable concession to the user. This may
not constitute a bug "fix", but I don't really see how this is a
feature.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-04-21 16:55:14 Re: Improve list manipulation in several places
Previous Message Peter Eisentraut 2023-04-21 16:49:04 Re: LLVM strip -x fails