Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2019-12-06 11:30:10
Message-ID: CAJ3gD9dqmdASLR_Wra-0fMDHRTcYwn-SFj-gsvxEWNB-GN-msA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 6 Dec 2019 at 15:40, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Dec 5, 2019 at 4:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 3, 2019 at 11:10 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > >
> > >
> > > Done as stated above; attached v3 patch. I have verified that the file
> > > handles do get closed in PG_CATCH block via
> > > ReorderBufferIterTXNFinish().
> > >
> >
> > I couldn't reproduce the original problem (on HEAD) reported with the
> > test case in the patch. So, I can't verify the fix. I think it is
> > because of recent commits cec2edfa7859279f36d2374770ca920c59c73dd8 and
> > 9290ad198b15d6b986b855d2a58d087a54777e87. It seems you need to either
> > change the value of logical_decoding_work_mem or change the test in
> > some way so that the original problem can be reproduced.

Yeah, it does seem like the commit cec2edfa78592 must have caused the
test to not reproduce on your env, although the test does fail for me
still. Setting logical_decoding_work_mem to a low value does sound
like a good idea. Will work on it.

> >
>
> Few comments:
> ----------------------
>
> 1.
> + /* Now that the state fields are initialized, it is safe to return it. */
> + *iter_state = state;
> +
> /* allocate heap */
> state->heap =
> binaryheap_allocate(state->nr_txns,
> ReorderBufferIterCompare,
>
> Is there a reason for not initializing iter_state after
> binaryheap_allocate? If we do so, then we don't need additional check
> you have added in ReorderBufferIterTXNFinish.

If iter_state is initialized *after* binaryheap_allocate, then we
won't be able to close the vfds if binaryheap_allocate() ereports().

>
> 2.
> /* No harm in resetting the offset even in case of failure */
> file->curOffset = 0;
>
> The above comment is not clear because you are not setting it in case
> of error rather this is a success path.

I meant, even if PathNameOpenFile() failed, it is ok to set
file->curOffset to 0, so need not set it only in case of *fd >= 0. In
most of the cases, fd would be valid, so just set file->curOffset to 0
always.

>
> 3.
> + *
> + * Note: The iterator state is returned through iter_state parameter rather
> + * than the function's return value. This is because the state gets cleaned up
> + * in a PG_CATCH block, so we want to make sure the caller gets back the state
> + * even if this function throws an exception, so that the state resources can
> + * be cleaned up.
>
> How about changing it slightly as follows to make it more clear.
> "Note: The iterator state is returned through iter_state parameter
> rather than the function's return value. This is because the state
> gets cleaned up in a PG_CATCH block in the caller, so we want to make
> sure the caller gets back the state even if this function throws an
> exception."

Agreed. Will do that in the next patch version.

>
> 4. I think we should also check how much time increase will happen for
> test_decoding regression test after the test added by this patch?
Yeah, it currently takes noticeably longer compared to the others.
Let's see if setting logical_decoding_work_mem to a min value allows
us to reproduce the test with much lesser number of inserts.

>
> 5. One naive question about the usage of PathNameOpenFile(). When it
> reaches the max limit, it will automatically close one of the files,
> but how will that be reflected in the data structure (TXNEntryFile)
> you are managing. Basically, when PathNameOpenFile closes some file,
> how will the corresponding vfd in TXNEntryFile be changed. Because if
> it is not changed, then won't it start pointing to some wrong
> filehandle?

In PathNameOpenFile(), excess kernel fds could be closed
(ReleaseLruFiles). But with that, the vfds themselves don't get
invalidated. Only the underlying kernel fd gets closed, and the
vfd->fd is marked VFD_CLOSED. The vfd array element remains valid (a
non-null vfd->fileName means the vfd slot is valid; check
FileIsValid). So later, when FileRead(vfd1) is called and that vfd1
happens to be the one that had got it's kernel fd closed, it gets
opened again through FileAccess().

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-06 12:00:01 Re: segmentation fault when cassert enabled
Previous Message Amit Kapila 2019-12-06 10:10:20 Re: logical decoding : exceeded maxAllocatedDescs for .spill files