Re: BUG #6200: standby bad memory allocations on SELECT

From: Bridget Frey <bridget(dot)frey(at)redfin(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6200: standby bad memory allocations on SELECT
Date: 2012-01-23 20:22:14
Message-ID: CAHOc93mRXhU0CMko1jLvxwkKz42jTRZ63xRxDATsEw2t5ukOZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing
an issue that seems very similar to the one reported as bug 6200. We see
approximately 2 dozen alloc errors per day across 3 slaves, and we are
getting one segfault approximately every 3 days. We did not experience
this issue before our upgrade (we were on version 8.4, and used skytools
for replication).

We are attempting to get a core dump on segfault (our last attempt did not
work due to a config issue for the core dump). We're also attempting to
repro the alloc errors on a test setup, but it seems like we may need quite
a bit of load to trigger the issue. We're not certain that the alloc
issues and the sefaults are "the same issue" - but it seems that it may be
since the OP for bug 6200 sees the same behavior. We have seen no issues
on the master, all alloc errors and segfaults have been on the slaves.

We've seen the alloc errors on a few different tables, but most frequently
on logins. Rows are added to the logins table one-by-one, and updates
generally happen one row at a time. The table is pretty basic, it looks
like this...

CREATE TABLE logins
(
login_id bigserial NOT NULL,
<snip - a bunch of columns>
CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
<snip - some other constraints...>
)
WITH (
FILLFACTOR=80,
OIDS=FALSE
);

The queries that trigger the alloc error on this table look like this (we
use hibernate hence the funny underscoring...)
select login0_.login_id as login1_468_0_, l... from logins login0_ where
login0_.login_id=$1

The alloc error in the logs looks like this:
-01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
invalid memory alloc request size 18446744073709551613

The alloc error is nearly always for size 18446744073709551613 - though we
have seen one time where it was a different amount...

We have been in touch with the OP for bug 6200, who said he may have time
to help us out a bit on debugging this. It seems like what is being
suggested is getting a build of postgres that will capture a stack trace
for each alloc issue and/or simply dump core when that happens. As this is
a production system we would prefer the former. As I mentioned above we're
also trying to get a core dump for the segfault.

We are treating this as extremely high priority as it is currently causing
2 dozen failures for users of our site per day, as well as a few min of
downtime for the segfault every 3 days. I realize there may be little that
the postgres experts can do until we provide more information - but since
our use case is really not very complicated here (basic use of HS), and
another site is also experiencing it, I figured it would be worth posting
about what we're seeing.

Thanks,
-Bridget Frey
Redfin

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Phil Sorber 2012-01-24 05:29:32 Segfault in backend CTE code
Previous Message Stefan Kaltenbrunner 2012-01-23 19:40:47 pgcrypto decrypt_iv() issue