Re: dsa_allocate() faliure

From: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
To: pryzby(at)telsasoft(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: dsa_allocate() faliure
Date: 2018-11-26 15:38:35
Message-ID: CAJk1zg3ZXhDsFg7tQGJ3ZD6N9dp+Q1_DU2N3=s3Ywb-u6Lhc5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

sorry, the message was sent out to early.

So, the issue occurs only on production db an right now I cannot reproduce
it.
I had a look at dmesg and indeed I see something like:

postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030
error 4 in postgres[557833db7000+6d5000]

and AFAIR other sessions I had opened at that time were indeed disconnected.

When it comes to the execution plan for max_parallel_workers=0.
There is no real difference.
I guess *max_parallel_workers *has no effect and
*max_parallel_workers_per_gather
*should have been used.
Why it caused a server crash is unknown right now.

I cannot really give a reproducible recipe.
My case is that I have a parent table with ~300 partitions.
And I initiate a select on ~100 of them with select [...] from fa where
client_id(<IDS>) and [filters].
I know this is not effective. Every partition has several indexes and this
query acquires a lot of locks... even for relations not used in the query.
PG11 should have better partition pruning mechanism but I'm not there yet
to upgrade.
Some of the partitions have millions of rows.

I'll keep observing maybe I'l find a pattern when this occurs.

--
regards,
pozdrawiam,
Jakub Glapa

On Mon, Nov 26, 2018 at 4:26 PM Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> wrote:

> So, the issue occurs only on production db an right now I cannot reproduce
> it.
> I had a look at dmesg and indeed I see something like:
>
>
> --
> regards,
> Jakub Glapa
>
>
> On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby <pryzby(at)telsasoft(dot)com>
> wrote:
>
>> On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
>> > Hi Justin, I've upgrade to 10.6 but the error still shows up:
>> >
>> > If I set it to max_parallel_workers=0 I also get and my connection is
>> being
>> > closed (but the server is alive):
>> >
>> > psql db(at)host as user => set max_parallel_workers=0;
>>
>> Can you show the plan (explain without analyze) for the nonparallel case?
>>
>> Also, it looks like the server crashed in that case (even if it restarted
>> itself quickly). Can you confirm ?
>>
>> For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
>> similar. And other clients would've been disconnected. (For example,
>> you'd
>> get an error in another, previously-connected session the next time you
>> run:
>> SELECT 1).
>>
>> In any case, could you try to find a minimal way to reproduce the problem
>> ? I
>> mean, is the dataset and query small and something you can publish, or
>> can you
>> reproduce with data generated from (for example) generate_series() ?
>>
>> Thanks,
>> Justin
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-26 15:43:20 Re: Constraint documentation
Previous Message Abhijit Menon-Sen 2018-11-26 15:35:19 Re: pgsql: Integrate recovery.conf into postgresql.conf

Browse pgsql-performance by date

  From Date Subject
Next Message Mariel Cherkassky 2018-11-26 15:51:22 Re: autovacuum run but last_autovacuum is empty
Previous Message Jakub Glapa 2018-11-26 15:26:45 Re: dsa_allocate() faliure