Quick Links

Re: dsa_allocate() faliure

From:	Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Fabio Isabettini <fisabettini(at)voipfuture(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>, Rick Otten <rottenwindfish(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: dsa_allocate() faliure
Date:	2019-02-04 07:52:17
Message-ID:	CAJk1zg28tqx2021D0j-RqFtbLe+SPj4JKdmnc+K2aJZTUYk3eQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Hi Thomas,
I was one of the reporter in the early Dec last year.
I somehow dropped the ball and forgot about the issue.
Anyhow I upgraded the clusters to pg11.1 and nothing changed. I also have a
rule to coredump but a segfault does not happen while this is occurring.
I see the error showing up every night on 2 different servers. But it's a
bit of a heisenbug because If I go there now it won't be reproducible.
It was suggested by Justin Pryzby that I recompile pg src with his patch
that would cause a coredump.
But I don't feel comfortable doing this especially if I would have to run
this with prod data.
My question is. Can I do anything like increasing logging level or enable
some additional options?
It's a production server but I'm willing to sacrifice a bit of it's
performance if that would help.

--
regards,
pozdrawiam,
Jakub Glapa

On Wed, Jan 30, 2019 at 4:13 AM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
> <fisabettini(at)voipfuture(dot)com> wrote:
> > we are facing a similar issue on a Production system using a Postgresql
> 10.6:
> >
> > org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ;
> ID: EXCEPTION on getstatistics_media ; ID: uidatareader.
> > run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7
> free pages
>
> > We would like not to stop the Production system and upgrade it to PG11.
> And even though would this guarantee a permanent fix?
> > Any suggestion?
>
> Hi Fabio,
>
> Thanks for your report. Could you please also show the query plan
> that runs on the "remote" node (where the error occurred)?
>
> There is no indication that upgrading to PG11 would help here. It
> seems we have an undiagnosed bug (in 10 and 11), and so far no one has
> been able to reproduce it at will. I personally have chewed a lot of
> CPU time on several machines trying various plan shapes and not seen
> this or the possibly related symptom from bug #15585 even once. But
> we have about three reports of each of the two symptoms. One reporter
> wrote to me off-list to say that they'd seen #15585 twice, the second
> time by running the same query in a tight loop for 8 hours, and then
> not seen it again in the past 3 weeks. Clearly there is issue needing
> a fix here, but I don't yet know what it is.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
>

In response to

Re: dsa_allocate() faliure at 2019-01-30 03:13:14 from Thomas Munro

Responses

Re: dsa_allocate() faliure at 2019-02-04 08:22:28 from Thomas Munro
Re: dsa_allocate() faliure at 2019-02-06 23:21:11 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Banck	2019-02-04 07:57:17	Re: Online verification of checksums
Previous Message	Andres Freund	2019-02-04 07:41:26	Re: Usage of epoch in txid_current

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Thomas Munro	2019-02-04 08:22:28	Re: dsa_allocate() faliure
Previous Message	Justin Pryzby	2019-02-01 18:08:11	Re: dsa_allocate() faliure