Re: Cancelling parallel query leads to segfault

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cancelling parallel query leads to segfault
Date: 2018-02-14 18:56:51
Message-ID: 20180214185651.277g7o3xdzys624d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-02-12 15:43:49 -0500, Peter Eisentraut wrote:
> On 2/6/18 12:06, Andres Freund wrote:
> > On 2018-02-06 12:01:08 -0500, Peter Eisentraut wrote:
> >> On 2/1/18 20:35, Andres Freund wrote:
> >>> On February 1, 2018 11:13:06 PM GMT+01:00, Peter Eisentraut
> >>> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> >>>> Here is a patch to implement that idea. Do you have a way to test it
> >>>> repeatedly, or do you just randomly cancel queries?
> >>>
> >>> For me cancelling the long running parallel queries I tried reliably
> >>> triggers the issue. I encountered it while cancelling tpch q1 during JIT
> >>> work.
> >>
> >> Why does canceling a query result in elog(FATAL)? It should just be
> >> elog(ERROR), which wouldn't trigger this issue.
> >
> > The workers are shut down.
>
> I have used the setup mentioned in
> <https://www.postgresql.org/message-id/6a909374-2602-7136-8c70-397330a418f3%402ndquadrant.com>
> to reproduce this, without success. I have tried statement_timeout and
> manual cancels. Any other ideas?
>
> I don't doubt that the issue exists, but it would be nice to be able to
> reproduce it.

With your example I can reliably trigger the issue if I shut down the
server while the query is running:

^C2018-02-14 10:54:06.786 PST [22261][] LOG: received fast shutdown request
2018-02-14 10:54:06.786 PST [22261][] LOG: aborting any active transactions
2018-02-14 10:54:06.786 PST [22275][4/3] FATAL: terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22275][4/3] STATEMENT: select from t1 where a = 55;
2018-02-14 10:54:06.786 PST [22274][5/3] FATAL: terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22274][5/3] STATEMENT: select from t1 where a = 55;
2018-02-14 10:54:06.786 PST [22271][3/2] FATAL: terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22271][3/2] STATEMENT: select from t1 where a = 55;
2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "logical replication launcher" (PID 22268) exited with exit code 1
2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "parallel worker" (PID 22274) exited with exit code 1
2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "parallel worker" (PID 22275) exited with exit code 1
2018-02-14 10:54:06.788 PST [22261][] LOG: server process (PID 22271) was terminated by signal 11: Segmentation fault
2018-02-14 10:54:06.788 PST [22261][] DETAIL: Failed process was running: select from t1 where a = 55;
2018-02-14 10:54:06.788 PST [22261][] LOG: terminating any other active server processes
2018-02-14 10:54:06.789 PST [22285][] FATAL: the database system is shutting down
2018-02-14 10:54:06.789 PST [22261][] LOG: abnormal database system shutdown
2018-02-14 10:54:06.790 PST [22261][] LOG: database system is shut down

but only if I don't use EXPLAIN ANALYZE. Not quite sure what that is
about.

Your patch appears to fix the issue.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rqtx 2018-02-14 19:45:30 [HACKERS] Inserting data into a new catalog table via source code
Previous Message Andres Freund 2018-02-14 18:35:08 Re: [COMMITTERS] pgsql: Rearm statement_timeout after each executed query.