From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Instability in the postgres_fdw regression test |
Date: | 2018-04-12 18:49:42 |
Message-ID: | CA+TgmoYeJGBwYSwj=UdJSEJVbRMAz6bhEpGL2k8UzEyeuqxd4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 12, 2018 at 12:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We've been seeing $subject since commit 1bc0100d2, with little clue
> as to the cause. Previous attempts to fix it by preventing autovacuum
> from running on the relevant tables didn't seem to help.
>
> I have now managed to reproduce the issue reliably enough to study it.
> (It turns out that on longfin's host, running the buildfarm script *under
> cron* produces the failure a reasonable percentage of the time. The
> identical test case, launched from an interactive shell, never fails.
> I speculate that the kernel scheduler treats cron jobs differently.)
> It is in fact a timing issue, and what triggers it is there being an
> active autovacuum task in another database. The fact that the buildfarm
> script uses USE_MODULE_DB in the contrib tests greatly increases the
> surface area of the problem, since that creates a lot more databases that
> autovacuum could be active in. Once I realized that, I found that it can
> trivially be reproduced by hand, in a "make installcheck" test, simply by
> having an open transaction in another DB in the cluster. For instance
> "select pg_sleep(60);" and then run make installcheck in postgres_fdw.
>
> So now, drilling down to the specific problem: what we're seeing is that
> the plans for some queries change because the "remote" server reports
> a different rowcount estimate than usual for
>
> EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
>
> Normally the estimate is one, but in the failure cases it's 12 or so.
> This estimate is coming out of ineq_histogram_selectivity, of course.
> Furthermore, the highest value actually present in the histogram is
> 1000, because that was the highest value of "C 1" when the test did
> ANALYZE up at the top. That means we'll invoke get_actual_variable_range
> to try to identify the actual current maximum. Most of the time, it
> returns 2010, which is correct, and we end up estimating that only
> about one row will exceed 2000.
>
> However, in the failure cases, what's getting extracted from the index
> is 9999. That's because we'd inserted and deleted that value further
> up in the test, and if there's been an open transaction holding back
> RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that
> entry still good. This value is enough larger than 2000 to move the
> selectivity estimate appreciably, and then kaboom.
>
> This theory successfully explains the observed fact that some buildfarm
> failures show differences in two query plans, while others show a
> difference just in the first one. In the latter cases, the external
> transaction ended, so that RecentGlobalXmin could advance, in between.
>
> What I propose to do to fix the instability is to change the test
> stanza that uses 9999 as a key-chosen-at-random to use something less
> far away from the normal range of "C 1" values, so that whether it's
> still visible to get_actual_variable_range has less effect on this
> selectivity estimate. That's a hack, for sure, but I don't see any
> other fix that's much less of a hack.
Thanks for the detective work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2018-04-12 18:55:53 | Re: Problem while setting the fpw with SIGHUP |
Previous Message | Peter Geoghegan | 2018-04-12 18:47:25 | wal_consistency_checking reports an inconsistency on master branch |