Re: [BUGS] Breakage with VACUUM ANALYSE + partitions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
Date: 2016-04-29 23:58:37
Message-ID: 20160429235837.b63wkjw4dduf56wt@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2016-04-28 17:41:29 +0100, Thom Brown wrote:
> I've noticed another breakage, which I can reproduce consistently.

> 2016-04-28 17:36:08 BST [18108]: [47-1] user=,db=,client= DEBUG: could not
> fsync file "base/24581/24594.1" but retrying: No such file or directory
> 2016-04-28 17:36:08 BST [18108]: [48-1] user=,db=,client= ERROR: could not
> fsync file "base/24581/24594.1": No such file or directory
> 2016-04-28 17:36:08 BST [18605]: [17-1]
> user=thom,db=postgres,client=[local] ERROR: checkpoint request failed
> 2016-04-28 17:36:08 BST [18605]: [18-1]
> user=thom,db=postgres,client=[local] HINT: Consult recent messages in the
> server log for details.

Yuck. md.c is so crummy :(

Basically the reason for the problem is that mdsync() needs to access
"formally non-existant segments" (as in ones where previous segments are
< RELSEG_SIZE), because we queue (and the might be preexistant) fsync
requests via register_dirty_segment() in mdtruncate().

I'm a bit of a loss of how to reconcile that view with the original
issue in this thread. The best I can come up with this moment is doing
a _mdfd_openseg() in mdsync() to open the truncated segment if
_mdfd_getseg() returned NULL. We don't want to normally use that in
either function because it'll imply a separate open() etc, which is
pretty expensive - but doing in the fallback case would be kind of ok.

Andres

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-04-30 00:20:59 Re: Bug report
Previous Message Andres Freund 2016-04-29 18:33:32 Re: streaming replication master can fail to shut down

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-30 00:10:55 Re: atomic pin/unpin causing errors
Previous Message Andreas Karlsson 2016-04-29 23:33:08 Re: Accidentally parallel unsafe functions