|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>|
|Cc:||Noah Misch <noah(at)leadboat(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|Subject:||Re: logical decoding : exceeded maxAllocatedDescs for .spill files|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> On Thu, Jan 9, 2020 at 11:15 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Noah Misch <noah(at)leadboat(dot)com> writes:
>>> Even so, a web search for "extend_brk" led to the answer. By default, 32-bit
>>> AIX binaries get only 256M of RAM for stack and sbrk. The new regression test
>>> used more than that, hence this crash.
>> Hm, so
>> (1) Why did we get a crash and not some more-decipherable out-of-resources
>> error? Can we improve that experience?
>> (2) Should we be dialing back the resource consumption of this test?
> In HEAD, we have a guc variable 'logical_decoding_work_mem' by which
> we can control the memory usage of changes and we have used that, but
> for back branches, we don't have such a control.
I poked into this a bit more by running the src/test/recovery tests under
restrictive ulimit settings. I used
ulimit -s 1024
ulimit -v 250000
(At least on my 64-bit RHEL6 box, reducing ulimit -v much below this
causes initdb to fail, apparently because the post-bootstrap process
tries to load all our tsearch and encoding conversion shlibs at once,
and it hasn't got enough VM space to do so. Someday we may have to
I did not manage to duplicate Noah's crash this way. What I see in
the v10 branch is that the new 006_logical_decoding.pl test fails,
but with a clean "out of memory" error. The memory map dump that
that produces fingers the culprit pretty unambiguously:
ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
ReorderBufferByXid: 24576 total in 2 blocks; 11888 free (3 chunks); 12688 used
Slab: TXN: 8192 total in 1 blocks; 5208 free (21 chunks); 2984 used
Slab: Change: 2170880 total in 265 blocks; 2800 free (35 chunks); 2168080 used
Grand total: 226714720 bytes in 27327 blocks; 590888 free (785 chunks); 226123832 used
The test case is only inserting 50K fairly-short rows, so this seems
like an unreasonable amount of memory to be consuming for that; and
even if you think it's reasonable, it clearly isn't going to scale
to large production transactions.
Now, the good news is that v11 and later get through
006_logical_decoding.pl just fine under the same restriction.
So we did something in v11 to fix this excessive memory consumption.
However, unless we're willing to back-port whatever that was, this
test case is clearly consuming excessive resources for the v10 branch.
We're not out of the woods either. I also observe that v12 and HEAD
fall over, under these same test conditions, with a stack-overflow
error in the 012_subtransactions.pl test. This seems to be due to
somebody's decision to use a heavily recursive function to generate a
bunch of subtransactions. Is there a good reason for hs_subxids() to
use recursion instead of a loop? If there is, what's the value of
using 201 levels rather than, say, 10?
Anyway it remains unclear why Noah's machine got a crash instead of
something more user-friendly. But the reason why it's only in the
v10 branch seems non-mysterious.
regards, tom lane
|Next Message||Michael Paquier||2020-01-10 00:06:24||Re: pgbench - use pg logging capabilities|
|Previous Message||cary huang||2020-01-09 23:17:47||Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)|