Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From: Noah Misch <noah(at)leadboat(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2020-01-09 05:37:04
Message-ID: 20200109053704.GA2502006@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 08, 2020 at 02:50:53PM +0530, Amit Khandekar wrote:
> On Sun, 5 Jan 2020 at 00:21, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > The buildfarm client can capture stack traces, but it currently doesn't do so
> > for TAP test suites (search the client code for get_stack_trace). If someone
> > feels like writing a fix for that, it would be a nice improvement. Perhaps,
> > rather than having the client code know all the locations where core files
> > might appear, failed runs should walk the test directory tree for core files?
>
> I think this might end up having the same code to walk the directory
> spread out on multiple files. Instead, I think in the build script, in
> get_stack_trace(), we can do an equivalent of "find <inputdir> -name
> "*core*" , as against the current way in which it looks for core files
> only in the specific data directory.

Agreed.

> Noah, is it possible to run a patch'ed build script once I submit a
> patch, so that we can quickly get the stack trace ? I mean, can we do
> this before getting the patch committed ? I guess, we can run the
> build script with a single branch specified, right ?

Yes to all questions, but it would not have helped in this case. First, v10
deletes PostgresNode base directories at the end of this test file, despite
the failure[1]. Second, the stack trace was minimal:

(gdb) bt
#0 0xd011119c in extend_brk () from /usr/lib/libc.a(shr.o)

Even so, a web search for "extend_brk" led to the answer. By default, 32-bit
AIX binaries get only 256M of RAM for stack and sbrk. The new regression test
used more than that, hence this crash. Setting LDR_CNTRL=MAXDATA=0x80000000
in the environment cured the crash. I've put that in the buildfarm member
configuration and started a new run.

(PostgreSQL documentation actually covers this problem:
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX)

[1] It has the all_tests_passing() logic in an attempt to stop this. I'm
guessing it didn't help because the file failed by calling die "connection
error: ...", not by reporting a failure to Test::More via ok(0) or similar.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-09 05:45:41 Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Previous Message Masahiko Sawada 2020-01-09 05:10:32 Re: [HACKERS] Block level parallel vacuum