Re: Elusive segfault with 9.3.5 & query cancel

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Elusive segfault with 9.3.5 & query cancel
Date: 2014-12-05 21:29:31
Message-ID: 548223BB.90206@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/05/2014 12:54 PM, Josh Berkus wrote:
> Hackers,
>
> This is not a complete enough report for a diagnosis. I'm posting it
> here just in case someone else sees something like it, and having an
> additional report will help figure out the underlying issue.
>
> * 700GB database with around 5,000 writes per second
> * 8 replicas handling around 10,000 read queries per second each
> * replicas are slammed (40-70% utilization)
> * replication produces lots of replication query cancels
>
> In this scenario, a specific query against some of the less busy and
> fairly small tables would produce a segfault (signal 11) once every 1-4
> days randomly. This query could have 100's of successful runs for every
> segfault. This was not reproduceable manually, and the segfaults never
> happened on the master. Nor did we ever see a segfault based on any
> other query, including against the tables which were generally the
> source of the query cancels.
>
> In case it's relevant, the query included use of regexp_split_to_array()
> and ORDER BY random(), neither of which are generally used in the user's
> other queries.
>
> We made some changes which decreased query cancel (optimizing queries,
> turning on hot_standby_feedback) and we haven't seen a segfault since
> then. As far as the user is concerned, this solves the problem, so I'm
> never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel: in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand. It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-12-05 22:11:16 Re: Elusive segfault with 9.3.5 & query cancel
Previous Message Josh Berkus 2014-12-05 20:54:50 Elusive segfault with 9.3.5 & query cancel