Re: dsa_allocate() faliure

From: Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>
To: Rick Otten <rottenwindfish(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure
Date: 2018-05-23 04:10:02
Message-ID: CADrk5qM0RxkkfsQJaqu6C2JJULs1Ormw-wALjDv1bN7sjDv=iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

>>dsa_allocate could not find 7 free pages
I just this error message again on all of my worker nodes (I am using
Citus 7.4 rel). The PG core is my own build of release_10_stable
(10.4) out of GitHub on Ubuntu.

What's the best way to debug this? I am running pre-production tests
for the next few days, so I could gather info. if necessary (I cannot
pinpoint a query to repro this yet, as we have 10K queries running
concurrently).

On Mon, Jan 29, 2018 at 1:35 PM, Rick Otten <rottenwindfish(at)gmail(dot)com> wrote:
> If I do a "set max_parallel_workers_per_gather=0;" before I run the query in
> that session, it runs just fine.
> If I set it to 2, the query dies with the dsa_allocate error.
>
> I'll use that as a work around until 10.2 comes out. Thanks! I have
> something that will help.
>
>
> On Mon, Jan 29, 2018 at 3:52 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>
>> On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Rick Otten <rottenwindfish(at)gmail(dot)com> writes:
>> >> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> >> avoid these errors:
>> >
>> >> $ psql -f failing_query.sql
>> >> psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free
>> >> pages
>> >> CONTEXT: parallel worker
>> >
>> > Hmm. There's only one place in the source code that emits that message
>> > text:
>> >
>> > /*
>> > * Ask the free page manager for a run of pages. This should
>> > always
>> > * succeed, since both get_best_segment and make_new_segment
>> > should
>> > * only return a non-NULL pointer if it actually contains enough
>> > * contiguous freespace. If it does fail, something in our
>> > backend
>> > * private state is out of whack, so use FATAL to kill the
>> > process.
>> > */
>> > if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
>> > elog(FATAL,
>> > "dsa_allocate could not find %zu free pages", npages);
>> >
>> > Now maybe that comment is being unreasonably optimistic, but it sure
>> > appears that this is supposed to be a can't-happen case, in which case
>> > you've found a bug.
>>
>> This is probably the bug fixed here:
>>
>>
>> https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org
>>
>> That was back patched, so 10.2 will contain the fix. The bug was not
>> in dsa.c itself, but in the parallel query code that mixed up DSA
>> areas, corrupting them. The problem comes up when the query plan has
>> multiple Gather nodes (and a particular execution pattern) -- is that
>> the case here, in the EXPLAIN output? That seems plausible given the
>> description of a 50-branch UNION. The only workaround until 10.2
>> would be to reduce max_parallel_workers_per_gather to 0 to prevent
>> parallelism completely for this query.
>>
>> --
>> Thomas Munro
>> http://www.enterprisedb.com
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-05-23 04:44:25 Re: dsa_allocate() faliure
Previous Message Kyotaro HORIGUCHI 2018-05-23 03:39:57 Re: perl checking

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Munro 2018-05-23 04:44:25 Re: dsa_allocate() faliure
Previous Message Pavan Teja 2018-05-22 18:25:39 Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!