Re: BUG #16833: postgresql 13.1 process crash every hour

From: Alex F <phoedos16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16833: postgresql 13.1 process crash every hour
Date: 2021-05-17 09:29:56
Message-ID: CAGbr_zXBK08XdeusNBJrF-sEP9tYSToc7o1wKphgSu2gWu+PaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Peter,
First of all thanks for your input with the upcoming fix. Anyway
application shouldn't crash with segfault, just log error.

Another point that I should mention - amcheck extension and "magic" query
which can help us to find a broken index. Without mentioned queries it was
absolutely unclear why the application crashed.

Is it possible to extend the error log which can help to understand what
exactly went wrong?
For example, if error log look like this:
2021-05-14 06:10:54 UTC [22258]: user=,db=,app=,client= LOG: server
process (PID 22273) was terminated by signal 11: Segmentation fault
2021-05-14 06:10:54 UTC [22258]: user=,db=,app=,client= DETAIL: Failed
process was running: REFRESH MATERIALIZED VIEW CONCURRENTLY
project.product_master_mv
***CAUSED BY violated for index "name_original_idx_s"***
e.g. trace marked with *** symbols can really help user to understand issue
root cause and significantly decrease database recovery time.
In my case I had to create a separate VM, create a database from scratch
and recover it from pg_dump. Unfortunately mentioned actions took a
significant downtime.

In case of master-standby configuration WAL replication does not save
standby servers from broken objects (broken index in described case).
Please advice is it possible to use logical replication here? From my
understanding logical replication shouldn't push broken objects on standby.

Thanks for your support!
сб, 15 мая 2021 г. в 03:10, Peter Geoghegan <pg(at)bowt(dot)ie>:

> On Fri, May 14, 2021 at 1:13 PM Alex F <phoedos16(at)gmail(dot)com> wrote:
> > Thanks for your support!
>
> I just pushed a commit that adds hardening that will be sufficient to
> prevent this being a hard crash. Of course the index should not become
> corrupt in the first place, but at least in Postgres 13.4 the same
> scenario will result in an error rather than in a hard crash.
>
> Thanks
> --
> Peter Geoghegan
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Yura Sokolov 2021-05-17 13:21:36 Re: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send
Previous Message Eric Thinnes 2021-05-17 07:01:01 Re: Segmentation fault when calling BlessTupleDesc in a C function in parallel on PostgreSQL-(12.6, 12.7, 13.2, 13.3)