Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date: 2005-11-02 18:45:44
Message-ID: 20051102184544.GI55520@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Wed, Nov 02, 2005 at 07:03:57AM -0500, Greg Stark wrote:
> > I would bet that ninety percent of the Asserts in the existing code are on
> > conditions that could represent, at worst, corruption of backend-local or
> > even transaction-local data structures. Taking down the entire database
> > cluster for that is not something that sounds like a stability-enhancing
> > tradeoff to me.
>
> It may be minor corruption or it may be that the reason for the minor
> corruption comes from some larger bug. It may also be backend-local or
> transaction-local corruption at the time the assert catches it but cause major
> damage by the time it actually crashes a non-assert-enabled database.

Agreed. Personally I'd want to know about anything that corrupts my
data, no matter what the locality. I would also argue that if people are
seeing 'minor' asserts firing in production that there's a bug that
needs to be tracked down.

IF it comes out that there's some asserts that can be fired even though
there's not anything really bad happening, they could always be
relegated to a second class of assert that's not normally turned on.

BTW, that's a reversal from what I was originally arguing for, which was
due to the performance penalty associated with --enable-cassert. My
client is now running with Tom's suggestion of commenting out
CLOBBER_FREED_MEMORY and MEMORY_CONTEXT_CHECKING and performance is
good. It appears to be as good as it was with asserts disabled. So I
think it would definately be good to break those options out from
--enable-cassert. That makes it viable to run with asserts in
production, at least from a performance standpoint.

BTW, they're also running with patch2 now. Previously, with asserts
turned on and without the patch, they were seeing assert failures on
average of 2/day. So hopefully tomorrow we'll have an idea if the patch
fixed this or not.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-11-02 18:46:44 Re: [HACKERS] Reducing the overhead of NUMERIC data
Previous Message Simon Riggs 2005-11-02 18:26:21 Re: [HACKERS] Reducing the overhead of NUMERIC data

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2005-11-02 18:46:44 Re: [HACKERS] Reducing the overhead of NUMERIC data
Previous Message Simon Riggs 2005-11-02 18:26:21 Re: [HACKERS] Reducing the overhead of NUMERIC data