Re: dsa_allocate() faliure

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
Cc: Fabio Isabettini <fisabettini(at)voipfuture(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, Sand Stone <sand(dot)m(dot)stone(at)gmail(dot)com>, Rick Otten <rottenwindfish(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: dsa_allocate() faliure
Date: 2019-02-04 08:22:28
Message-ID: CAEepm=2aHnTfPJnPbeS3AxO-ENoUg5-akuD-7PWYbn8+-c9JmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Mon, Feb 4, 2019 at 6:52 PM Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> wrote:
> I see the error showing up every night on 2 different servers. But it's a bit of a heisenbug because If I go there now it won't be reproducible.

Huh. Ok well that's a lot more frequent that I thought. Is it always
the same query? Any chance you can get the plan? Are there more
things going on on the server, like perhaps concurrent parallel
queries?

> It was suggested by Justin Pryzby that I recompile pg src with his patch that would cause a coredump.

Small correction to Justin's suggestion: don't abort() after
elog(ERROR, ...), it'll never be reached.

> But I don't feel comfortable doing this especially if I would have to run this with prod data.
> My question is. Can I do anything like increasing logging level or enable some additional options?
> It's a production server but I'm willing to sacrifice a bit of it's performance if that would help.

If you're able to run a throwaway copy of your production database on
another system that you don't have to worry about crashing, you could
just replace ERROR with PANIC and run a high-speed loop of the query
that crashed in product, or something. This might at least tell us
whether it's reach that condition via something dereferencing a
dsa_pointer or something manipulating the segment lists while
allocating/freeing.

In my own 100% unsuccessful attempts to reproduce this I was mostly
running the same query (based on my guess at what ingredients are
needed), but perhaps it requires a particular allocation pattern that
will require more randomness to reach... hmm.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-02-04 08:23:39 RE: Protect syscache from bloating with negative cache entries
Previous Message Michael Banck 2019-02-04 07:57:17 Re: Online verification of checksums

Browse pgsql-performance by date

  From Date Subject
Next Message suganthi Sekar 2019-02-04 09:57:31 Fw: server hardware tuning.
Previous Message Jakub Glapa 2019-02-04 07:52:17 Re: dsa_allocate() faliure