Quick Links

RE: dsa_allocate() faliure

From:	Arne Roland <A(dot)Roland(at)index(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>
Cc:	Rick Otten <rottenwindfish(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	RE: dsa_allocate() faliure
Date:	2019-01-24 14:44:41
Message-ID:	6f3fe9fa5a984dc19e40e79fbef45edc@index.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hello,

I'm not sure whether this is connected at all, but I'm facing the same error with a generated query on postgres 10.6.
It works with parallel query disabled and gives "dsa_allocate could not find 7 free pages" otherwise.

I've attached query and strace. The table is partitioned on (o, date). It's not depended on the precise lists I'm using, while it obviously does depend on the fact that the optimizer chooses a parallel query.

Regards
Arne Roland

-----Original Message-----
From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Sent: Friday, October 5, 2018 4:17 AM
To: Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>
Cc: Rick Otten <rottenwindfish(at)gmail(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; pgsql-performance(at)lists(dot)postgresql(dot)org; Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure

On Wed, Aug 29, 2018 at 5:48 PM Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com> wrote:
> I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of the worker nodes. I anonymised the query text a bit. Interestingly, this time only one (same one) of the nodes is crashing. Since this is a production environment, I cannot get the stack trace. Once turned off parallel execution for this node. The whole query finished just fine. So the parallel query plan is from one of the nodes not crashed, hopefully the same plan would have been executed on the crashed node. In theory, every worker node has the same bits, and very similar data.

I wonder if this was a different symptom of the problem fixed here:

https://www.postgresql.org/message-id/flat/194c0706-c65b-7d81-ab32-2c248c3e2344%402ndquadrant.com

Can you still reproduce it on current master, REL_11_STABLE or REL_10_STABLE?

--
Thomas Munro
http://www.enterprisedb.com

Attachment	Content-Type	Size
strace.log	application/octet-stream	435.5 KB
query.sql	application/octet-stream	141.1 KB

In response to

Re: dsa_allocate() faliure at 2018-10-05 02:16:41 from Thomas Munro

Responses

RE: dsa_allocate() faliure at 2019-01-28 13:50:50 from Arne Roland

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2019-01-24 14:47:32	Re: Protect syscache from bloating with negative cache entries
Previous Message	Tom Lane	2019-01-24 14:37:41	Re: Use an enum for RELKIND_*?

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jan Nielsen	2019-01-24 16:52:03	Re: SELECT performance drop
Previous Message	Mariel Cherkassky	2019-01-24 14:14:21	Re: ERROR: found xmin from before relfrozenxid