Re: dsa_allocate() faliure

From: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Fabio Isabettini <fisabettini(at)voipfuture(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>, Rick Otten <rottenwindfish(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure
Date: 2019-02-04 07:52:17
Message-ID: CAJk1zg28tqx2021D0j-RqFtbLe+SPj4JKdmnc+K2aJZTUYk3eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi Thomas,
I was one of the reporter in the early Dec last year.
I somehow dropped the ball and forgot about the issue.
Anyhow I upgraded the clusters to pg11.1 and nothing changed. I also have a
rule to coredump but a segfault does not happen while this is occurring.
I see the error showing up every night on 2 different servers. But it's a
bit of a heisenbug because If I go there now it won't be reproducible.
It was suggested by Justin Pryzby that I recompile pg src with his patch
that would cause a coredump.
But I don't feel comfortable doing this especially if I would have to run
this with prod data.
My question is. Can I do anything like increasing logging level or enable
some additional options?
It's a production server but I'm willing to sacrifice a bit of it's
performance if that would help.

--
regards,
pozdrawiam,
Jakub Glapa

On Wed, Jan 30, 2019 at 4:13 AM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
> <fisabettini(at)voipfuture(dot)com> wrote:
> > we are facing a similar issue on a Production system using a Postgresql
> 10.6:
> >
> > org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ;
> ID: EXCEPTION on getstatistics_media ; ID: uidatareader.
> > run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7
> free pages
>
> > We would like not to stop the Production system and upgrade it to PG11.
> And even though would this guarantee a permanent fix?
> > Any suggestion?
>
> Hi Fabio,
>
> Thanks for your report. Could you please also show the query plan
> that runs on the "remote" node (where the error occurred)?
>
> There is no indication that upgrading to PG11 would help here. It
> seems we have an undiagnosed bug (in 10 and 11), and so far no one has
> been able to reproduce it at will. I personally have chewed a lot of
> CPU time on several machines trying various plan shapes and not seen
> this or the possibly related symptom from bug #15585 even once. But
> we have about three reports of each of the two symptoms. One reporter
> wrote to me off-list to say that they'd seen #15585 twice, the second
> time by running the same query in a tight loop for 8 hours, and then
> not seen it again in the past 3 weeks. Clearly there is issue needing
> a fix here, but I don't yet know what it is.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2019-02-04 07:57:17 Re: Online verification of checksums
Previous Message Andres Freund 2019-02-04 07:41:26 Re: Usage of epoch in txid_current

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Munro 2019-02-04 08:22:28 Re: dsa_allocate() faliure
Previous Message Justin Pryzby 2019-02-01 18:08:11 Re: dsa_allocate() faliure