Re: dsa_allocate() faliure

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Rick Otten <rottenwindfish(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure
Date: 2018-01-29 20:52:43
Message-ID: CAEepm=0Q5P2jM9hdZ6vkoKKzXce-9Oi9GtCdWVPBYumC1G7+mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Rick Otten <rottenwindfish(at)gmail(dot)com> writes:
>> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> avoid these errors:
>
>> $ psql -f failing_query.sql
>> psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free pages
>> CONTEXT: parallel worker
>
> Hmm. There's only one place in the source code that emits that message
> text:
>
> /*
> * Ask the free page manager for a run of pages. This should always
> * succeed, since both get_best_segment and make_new_segment should
> * only return a non-NULL pointer if it actually contains enough
> * contiguous freespace. If it does fail, something in our backend
> * private state is out of whack, so use FATAL to kill the process.
> */
> if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
> elog(FATAL,
> "dsa_allocate could not find %zu free pages", npages);
>
> Now maybe that comment is being unreasonably optimistic, but it sure
> appears that this is supposed to be a can't-happen case, in which case
> you've found a bug.

This is probably the bug fixed here:

https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org

That was back patched, so 10.2 will contain the fix. The bug was not
in dsa.c itself, but in the parallel query code that mixed up DSA
areas, corrupting them. The problem comes up when the query plan has
multiple Gather nodes (and a particular execution pattern) -- is that
the case here, in the EXPLAIN output? That seems plausible given the
description of a 50-branch UNION. The only workaround until 10.2
would be to reduce max_parallel_workers_per_gather to 0 to prevent
parallelism completely for this query.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-01-29 21:05:31 Re: [HACKERS] datetime.h defines like PM conflict with external libraries
Previous Message Adam Brightwell 2018-01-29 20:45:39 Re: PATCH: Exclude unlogged tables from base backups

Browse pgsql-performance by date

  From Date Subject
Next Message Rick Otten 2018-01-29 21:35:53 Re: dsa_allocate() faliure
Previous Message Tom Lane 2018-01-29 16:37:09 Re: dsa_allocate() faliure