Re: BUG #6200: standby bad memory allocations on SELECT

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bridget Frey <bridget(dot)frey(at)redfin(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6200: standby bad memory allocations on SELECT
Date: 2012-01-27 14:31:43
Message-ID: CA+TgmoZ6S5e46xThsHKv6-vV58f==D4_TH_ECB2sQsZRngL+8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget(dot)frey(at)redfin(dot)com> wrote:
> Hello,
> We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing an
> issue that seems very similar to the one reported as bug 6200.  We see
> approximately 2 dozen alloc errors per day across 3 slaves, and we are
> getting one segfault approximately every 3 days.  We did not experience this
> issue before our upgrade (we were on version 8.4, and used skytools for
> replication).
>
> We are attempting to get a core dump on segfault (our last attempt did not
> work due to a config issue for the core dump).  We're also attempting to
> repro the alloc errors on a test setup, but it seems like we may need quite
> a bit of load to trigger the issue.  We're not certain that the alloc issues
> and the sefaults are "the same issue" - but it seems that it may be since
> the OP for bug 6200 sees the same behavior.  We have seen no issues on the
> master, all alloc errors and segfaults have been on the slaves.
>
> We've seen the alloc errors on a few different tables, but most frequently
> on logins.  Rows are added to the logins table one-by-one, and updates
> generally happen one row at a time.  The table is pretty basic, it looks
> like this...
>
> CREATE TABLE logins
> (
>   login_id bigserial NOT NULL,
>   <snip - a bunch of columns>
>   CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
>   <snip - some other constraints...>
> )
> WITH (
>   FILLFACTOR=80,
>   OIDS=FALSE
> );
>
> The queries that trigger the alloc error on this table look like this (we
> use hibernate hence the funny underscoring...)
> select login0_.login_id as login1_468_0_, l...  from logins login0_ where
> login0_.login_id=$1
>
> The alloc error in the logs looks like this:
> -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
> invalid memory alloc request size 18446744073709551613
>
> The alloc error is nearly always for size 18446744073709551613 - though we
> have seen one time where it was a different amount...

Hmm, that number in hex works out to 0xfffffffffffffffd, which makes
it sound an awful lot like the system (for some unknown reason)
attempted to allocate -3 bytes of memory. I've seen something like
this once before on a customer system running a modified version of
PostgreSQL. In that case, the problem turned out to be page
corruption. Circumstances didn't permit determination of the root
cause of the page corruption, however, nor was I able to figure out
exactly how the corruption I saw resulted in an allocation request
like this. It would be nice to figure out where in the code this is
happening and put in a higher-level guard so that we get a better
error message.

You want want to compile a modified PostgreSQL executable that puts an
extremely long sleep (like a year) just before this error is reported.
Then, when the system hangs at that point, you can attach a debugger
and pull a stack backtrace. Or you could insert an abort() at that
point in the code and get a backtrace from the core dump.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Marko Kreen 2012-01-27 15:20:56 Re: pgcrypto decrypt_iv() issue
Previous Message Dharmendra Goyal 2012-01-27 10:15:12 Re: Windows x86-64 One-Click Install (9.1.2-1, 9.0.6-1) hangs on "initialising the database cluster" (with work-around)