From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: standby server crashes hard on out-of-disk-space in HEAD |
Date: | 2017-06-12 19:21:15 |
Message-ID: | 20170612192115.pjh6sovzksyyptnt@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-06-12 15:12:23 -0400, Robert Haas wrote:
> (On Mon, Jun 12, 2017 at 12:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > logfile from a standby server:
> >
> > 2017-06-12 11:43:46.450 EDT [13605] LOG: started streaming WAL from primary at 3/E6000000 on timeline 1
> > 2017-06-12 11:47:46.992 EDT [11261] FATAL: could not extend file "base/47578/54806": No space left on device
> > 2017-06-12 11:47:46.992 EDT [11261] HINT: Check free disk space.
> > 2017-06-12 11:47:46.992 EDT [11261] CONTEXT: WAL redo at 8/EC7E0CF8 for XLOG/FPI:
> > 2017-06-12 11:47:46.992 EDT [11261] WARNING: buffer refcount leak: [1243] (rel=base/47578/54806, blockNum=5249, flags=0x8a000000, refcount=1 1)
> > TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2523)
> > 2017-06-12 11:47:47.567 EDT [11259] LOG: startup process (PID 11261) was terminated by signal 6: Aborted
> > 2017-06-12 11:47:47.567 EDT [11259] LOG: terminating any other active server processes
> > 2017-06-12 11:47:47.584 EDT [11259] LOG: database system is shut down
> >
> > The FATAL is fine, but we shouldn't have that WARNING I think, and
> > certainly not the assertion failure.
Just for clarification: It's a WARNING so we print all missed leaks,
rather than erroring/asserting at the first leak. We've for a long
while Asserted there's not a single pin failure (in earlier releases we
asserted out at the first leak).
> Commit 4b4b680c3d6d8485155d4d4bf0a92d3a874b7a65 (Make backend local
> tracking of buffer pins memory efficient., vintage 2014) seems like a
> likely culprit here, but I haven't tested.
I'm not that sure. As written above, the Assert isn't new, and given
this hasn't been reported before, I'm a bit doubtful that it's a general
refcount tracking bug. The FPI code has been whacked around more
heavily, so it could well be a bug in it somewhere.
- Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-06-12 19:38:33 | Relpartbound, toasting and pg_class |
Previous Message | Thomas Munro | 2017-06-12 19:19:00 | Re: Transition tables vs ON CONFLICT |