From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Elusive segfault with 9.3.5 & query cancel |
Date: | 2014-12-05 21:29:31 |
Message-ID: | 548223BB.90206@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/05/2014 12:54 PM, Josh Berkus wrote:
> Hackers,
>
> This is not a complete enough report for a diagnosis. I'm posting it
> here just in case someone else sees something like it, and having an
> additional report will help figure out the underlying issue.
>
> * 700GB database with around 5,000 writes per second
> * 8 replicas handling around 10,000 read queries per second each
> * replicas are slammed (40-70% utilization)
> * replication produces lots of replication query cancels
>
> In this scenario, a specific query against some of the less busy and
> fairly small tables would produce a segfault (signal 11) once every 1-4
> days randomly. This query could have 100's of successful runs for every
> segfault. This was not reproduceable manually, and the segfaults never
> happened on the master. Nor did we ever see a segfault based on any
> other query, including against the tables which were generally the
> source of the query cancels.
>
> In case it's relevant, the query included use of regexp_split_to_array()
> and ORDER BY random(), neither of which are generally used in the user's
> other queries.
>
> We made some changes which decreased query cancel (optimizing queries,
> turning on hot_standby_feedback) and we haven't seen a segfault since
> then. As far as the user is concerned, this solves the problem, so I'm
> never going to get a trace or a core dump file.
Forgot a major piece of evidence as to why I think this is related to
query cancel: in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand. It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-12-05 22:11:16 | Re: Elusive segfault with 9.3.5 & query cancel |
Previous Message | Josh Berkus | 2014-12-05 20:54:50 | Elusive segfault with 9.3.5 & query cancel |