Re: Cache invalidation bug in RelationGetIndexAttrBitmap()

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cache invalidation bug in RelationGetIndexAttrBitmap()
Date: 2014-05-14 19:04:41
Message-ID: 5373BE49.3050406@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.5.2014 17:52, Andres Freund wrote:
> On 2014-05-14 15:17:39 +0200, Andres Freund wrote:
>> On 2014-05-14 15:08:08 +0200, Tomas Vondra wrote:
>>> Apparently there's something wrong with 'test-decoding-check':
>>
>> Man. I shouldn't have asked... My code. There's some output in there
>> that's probably triggered by the extraordinarily long runtimes, but
>> there's definitely something else wrong.
>> My gut feeling says it's in RelationGetIndexList().
>
> Nearly right. It's in RelationGetIndexAttrBitmap(). Fix attached.
>
> Tomas, thanks for that. I've never (and probably will never) run
> CLOBBER_CACHE_RECURSIVELY during development. Having a machine do that
> regularly is really helpful. How long does a single testrun take? It
> takes hundreds of seconds here to do a single UPDATE?

Don't know yet, as it fails at the beginning. But I suppose it will be
tens or possibly hundreds of hours. For example these are the logs from
regular build (no clobber etc.)

May 14 19:00 SCM-checkout.log
May 14 19:00 githead.log
May 14 19:00 configure.log
May 14 19:00 config.log
May 14 19:05 make.log
May 14 19:05 check.log
May 14 19:06 make-contrib.log
May 14 19:06 make-install.log
May 14 19:06 install-contrib.log
May 14 19:07 check-pg_upgrade.log
May 14 19:08 test-decoding-check.log

while these are the logs from recursive clobber:

May 14 00:19 SCM-checkout.log
May 14 00:20 configure.log
May 14 00:20 config.log
May 14 00:26 make.log
May 14 03:12 check.log
May 14 03:13 make-contrib.log
May 14 03:13 make-install.log
May 14 03:13 install-contrib.log
May 14 08:25 check-pg_upgrade.log
May 14 09:07 test-decoding-check.log
May 14 09:07 web-txn.data

So with the regular build, it took <1 minute to do 'make check' and ~1
minute to test pg_upgrade, with recursive clobber it takes ~3 hours and
~5 hours. That's a factor of ~300, although it's a very rough estimate.

Without clobber the whole run (for a "C" locale) takes ~10 minutes, so
my estimate is ~50 hours for the recursive one. But I wouldn't be
surprised by 100 hours.

>
> There were some more differences but those are all harmless and caused
> by the extraordinarily long runtime (autovacuums). I think we need to
> add a feature to test_decoding to suppress displaying transactions
> without changes. Ick.
>

I expect to hit more timing-related issues with the recursive clobber
tests - not necessarily in the code/tests itself, but I guess the
buildfarm tooling doesn't really expect runs that long.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2014-05-14 20:29:12 Re: SKIP LOCKED DATA (work in progress)
Previous Message Bruce Momjian 2014-05-14 18:13:29 Re: 9.4 release notes