Re: Breakage with VACUUM ANALYSE + partitions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject: Re: Breakage with VACUUM ANALYSE + partitions
Date: 2016-03-25 16:49:27
Message-ID: 20160325164927.c522xd7nw2a74yu5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2016-03-25 12:02:05 -0400, Robert Haas wrote:
> On Fri, Mar 25, 2016 at 8:41 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Thu, Mar 24, 2016 at 9:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> On Thu, Mar 24, 2016 at 12:59 AM, Haribabu Kommi
> >> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> >>> So further operations on the table uses the already constructed smgr relation
> >>> and treats that there are RELSEG_SIZE number of blocks in the page and try
> >>> to do the scan. But there are 0 pages in the table thus it produces the error.
> >>>
> >>> The issue doesn't occur from another session. Because of this reason only
> >>> if we do only vacuum operation, the error not occurred.
> >>
> >> Yeah, I had a suspicion that this might have to do with invalidation
> >> messages based on Thom's description, but I think we still need to
> >> track down which commit is at fault.
> >
> > I could reproduce the failure on Linux, not on OSX, and bisecting the
> > failure, the first bad commit is this one:
> > commit: 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf
> > author: Andres Freund <andres(at)anarazel(dot)de>
> > date: Thu, 10 Mar 2016 17:04:34 -0800
> > Allow to trigger kernel writeback after a configurable number of writes.
> >
> > The failure is a little bit sporadic, based on my tests 1/2 runs out
> > of 10 could pass, so one good commit was recognized as such after
> > passing the SQL sequence sent by Thom 5 times in a row. I also did
> > some manual tests and those are pointing to this commit as well.
> >
> > I am adding Fabien and Andres in CC for some feedback.
>
> Gosh, that's surprising. I wonder if that just revealed an underlying
> issue rather than creating it.

I think that might be the case, but I'm not entirely sure yet. It
appears to me that the current backend - others don't show the problem -
still has the first segment of pgbench_accounts open (in the md.c
mdfdvec sense); likely because there were remaining flush
requests. Thus, when mdnblocks is called to get the size of the relation
we return the size of the first segment (131072) plus the size of the
second segment (0, doesn't exist). That then leads to this error.

I don't really understand yet how the "open segment" thing happens.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2016-03-25 20:15:25 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);
Previous Message Robert Haas 2016-03-25 16:02:05 Re: Breakage with VACUUM ANALYSE + partitions

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-25 16:54:26 Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Previous Message Tom Lane 2016-03-25 16:47:02 Re: [COMMITTERS] pgsql: Move each SLRU's lwlocks to a separate tranche.