Re: Memory leak from ExecutorState context?

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Memory leak from ExecutorState context?
Date: 2023-05-17 17:10:08
Message-ID: 20230517191008.7bee6aa7@karst
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 16 May 2023 16:00:52 -0400
Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:

> On Tue, May 16, 2023 at 04:00:51PM +0200, Jehan-Guillaume de Rorthais wrote:
>
> > From e5ecd466172b7bae2f1be294c1a5e70ce2b43ed8 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Thu, 30 Apr 2020 07:16:28 -0700
> > Subject: [PATCH v8 1/3] Describe hash join implementation
> >
> > This is just a draft to spark conversation on what a good comment might
> > be like in this file on how the hybrid hash join algorithm is
> > implemented in Postgres. I'm pretty sure this is the accepted term for
> > this algorithm https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
>
> I recommend changing the commit message to something like this:
>
> Describe hash join implementation
>
> Add details to the block comment in nodeHashjoin.c describing the
> hybrid hash join algorithm at a high level.
>
> Author: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Author: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> Reviewed-by: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> Discussion: https://postgr.es/m/20230516160051.4267a800%40karst

Done, but assigning myself as a reviewer as I don't remember having authored
anything in this but the reference to the Hybrid hash page, which is
questionable according to Tomas :)

> > diff --git a/src/backend/executor/nodeHashjoin.c
> > b/src/backend/executor/nodeHashjoin.c index 0a3f32f731..93a78f6f74 100644
> > --- a/src/backend/executor/nodeHashjoin.c
> > ...
> > + * TODO: should we discuss that tuples can only spill forward?
>
> I would just cut this for now since we haven't started on an agreed-upon
> wording.

Removed in v9.

> > From 309ad354b7a9e4dfa01b2985bd883829f5e0eba0 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > Date: Tue, 16 May 2023 15:42:14 +0200
> > Subject: [PATCH v8 2/3] Allocate hash batches related BufFile in a dedicated
> > context
>
> Here is a draft commit message for the second patch:
>
> ...

Thanks. Adopted with some minor rewording... hopefully it's OK.

> I recommend editing and adding links to the various discussions on this
> topic from your research.

Done in v9.

> As for the patch itself, I noticed that there are few things needing
> pgindenting.
>
> I usually do the following to run pgindent (in case you haven't done
> this recently).
>
> ...

Thank you for your recipe.

> There are some existing indentation issues in these files, but you can
> leave those or put them in a separate commit.

Reindented in v9.

I put existing indentation issues in a separate commit to keep the actual
patches clean from distractions.

> ...
> Add a period at the end of this comment.
>
> > + /*
> > + * Use hash join spill memory context to allocate accessors and
> > their
> > + * buffers
> > + */

Fixed in v9.

> Add a period at the end of this comment.
>
> > + /* Use hash join spill memory context to allocate accessors */

Fixed in v9.

> > diff --git a/src/backend/executor/nodeHashjoin.c
> > b/src/backend/executor/nodeHashjoin.c @@ -1313,21 +1314,30 @@
> > ExecParallelHashJoinNewBatch(HashJoinState *hjstate)
> > * The data recorded in the file for each tuple is its hash value,
> > * then the tuple in MinimalTuple format.
> > *
> > - * Note: it is important always to call this in the regular executor
> > - * context, not in a shorter-lived context; else the temp file buffers
> > - * will get messed up.
> > + * If this is the first write to the batch file, this function first
> > + * create it. The associated BufFile buffer is allocated in the given
> > + * context. It is important to always give the HashSpillContext
> > + * context. First to avoid a shorter-lived context, else the temp file
> > + * buffers will get messed up. Second, for a better accounting of the
> > + * spilling memory consumption.
> > + *
> > */
>
> Here is my suggested wording fot this block comment:
>
> The batch file is lazily created. If this is the first tuple written to
> this batch, the batch file is created and its buffer is allocated in the
> given context. It is important to pass in a memory context which will
> live for the entirety of the lifespan of the batch.

Reworded. The context must actually survive the batch itself, not just live
during the lifespan of the batch.

> > void
> > ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
> > - BufFile **fileptr)
> > + BufFile **fileptr, MemoryContext
> > cxt) {

Note that I changed this to:

ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
BufFile **fileptr, HashJoinTable hashtable) {

As this function must allocate BufFile buffer in spillCxt, I suppose
we should force it explicitly in the function code.

Plus, having hashtable locally could be useful for later patch to eg. fine track
allocated memory in spaceUsed.

> > diff --git a/src/include/executor/hashjoin.h
> > b/src/include/executor/hashjoin.h index 8ee59d2c71..ac27222d18 100644
> > --- a/src/include/executor/hashjoin.h
> > +++ b/src/include/executor/hashjoin.h
> > @@ -23,12 +23,12 @@
> > /* ----------------------------------------------------------------
> > * hash-join hash table structures
> > *
> > - * Each active hashjoin has a HashJoinTable control block, which is
> > - * palloc'd in the executor's per-query context. All other storage needed
> > - * for the hashjoin is kept in private memory contexts, two for each
> > hashjoin.
> > - * This makes it easy and fast to release the storage when we don't need it
> > - * anymore. (Exception: data associated with the temp files lives in the
> > - * per-query context too, since we always call buffile.c in that context.)
> > + * Each active hashjoin has a HashJoinTable structure, which is
>
> "Other storages" should be "Other storage"
>
> > + * palloc'd in the executor's per-query context. Other storages needed for

Fixed in v9.

> ...
>
> "mainly hash's meta datas" -> "mainly the hashtable's metadata"
>
> > + * "hashCxt" (mainly hash's meta datas). Also, the "hashCxt" context is the

Fixed in v9.

> Suggested alternative wording for the below:
>
> * Data associated with temp files is allocated in the "spillCxt" context
> * which lives for the duration of the entire join as batch files'
> * creation and usage may span batch execution. These files are
> * explicitly destroyed by calling BufFileClose() when the code is done
> * with them. The aim of this context is to help accounting for the
> * memory allocated for temp files and their buffers.

Adopted in v9.

> Suggested alternative wording for the below:
>
> * Finally, data used only during a single batch's execution is allocated
> * in the "batchCxt". By resetting the batchCxt at the end of each batch,
> * we free all the per-batch storage reliably and without tedium.

Adopted in v9.

Thank you for your review!

Regards,

Attachment Content-Type Size
v9-0001-Describe-hash-join-implementation.patch text/x-patch 3.0 KB
v9-0002-Dedicated-memory-context-for-hash-join-spill-buff.patch text/x-patch 13.6 KB
v9-0003-Run-pgindent-on-nodeHash.c-and-nodeHashjoin.c.patch text/x-patch 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2023-05-17 17:10:16 Fwd: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Jonathan S. Katz 2023-05-17 16:57:45 Re: Possible regression setting GUCs on \connect