Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
Date: 2012-05-31 14:16:41
Message-ID: CA+TgmobXSwaEe8qVxa+50=Fk4iJMkJUmjSqJ4bHX8bMM-b10dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 31, 2012 at 9:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The one thing that still seems a little odd to me is that this caused
>> a pin count to get orphaned.  It seems reasonable that ignoring the
>> AccessExclusiveLock could result in not-found errors trying to open a
>> missing relation, and even fsync requests on a missing relation.  But
>> I don't see why that would cause the backend-local pin counts to get
>> messed up, which makes me wonder if there really is another bug here
>> somewhere.
>
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
>
>> 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file "base/21268/32994": No such file or directory
>> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation base/21268/32994
>>          xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 tuples
>> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741)
>> 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was terminated by signal 6: Aborted
>
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.

Which code isn't meant to recover from errors?

> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP.  So you'd end up with orphaned buffers belonging to a
> vanished table.  If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.

Right, I understand the other errors. It's just the pin count that I
am a bit confused about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Klemme 2012-05-31 14:17:11 Re: pg_dump and thousands of schemas
Previous Message Tatsuo Ishii 2012-05-31 14:07:57 Re: pg_dump and thousands of schemas