Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2020-01-12 03:39:30
Message-ID: 20200112033930.6cy3n6ybft6pp7gw@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 09, 2020 at 07:40:12PM -0500, Tom Lane wrote:
>I wrote:
>> ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks); 223295504 used
>> The test case is only inserting 50K fairly-short rows, so this seems
>> like an unreasonable amount of memory to be consuming for that; and
>> even if you think it's reasonable, it clearly isn't going to scale
>> to large production transactions.
>> Now, the good news is that v11 and later get through
>> just fine under the same restriction.
>> So we did something in v11 to fix this excessive memory consumption.
>> However, unless we're willing to back-port whatever that was, this
>> test case is clearly consuming excessive resources for the v10 branch.
>I dug around a little in the git history for backend/replication/logical/,
>and while I find several commit messages mentioning memory leaks and
>faulty spill logic, they all claim to have been back-patched as far
>as 9.4.
>It seems reasonably likely to me that this result is telling us about
>an actual bug, ie, faulty back-patching of one or more of those fixes
>into v10 and perhaps earlier branches.
>I don't know this code well enough to take point on looking for the
>problem, though.

Well, one thing we did in 11 is introduction of the Generation context.
In 10 we're still stashing all tuple data into the main AllocSet. I
wonder if backporting a4ccc1cef5a04cc054af83bc4582a045d5232cb3 and a
couple of follow-up fixes would make the issue go away.


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-01-12 03:41:03 Re: ECPG: proposal for new DECLARE STATEMENT
Previous Message Tom Lane 2020-01-12 03:32:45 Re: Why is pq_begintypsend so slow?