Re: Parallel append plan instability/randomness

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel append plan instability/randomness
Date: 2018-01-09 04:05:30
Message-ID: CAA4eK1Jh+8VXDFaxUF7A4v10sHHzDM0XV8pimJKPVP+2GaBKGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 9, 2018 at 12:48 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jan 7, 2018 at 11:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> One theory that can explain above failure is that the costs of
>>> scanning some of the sub-paths is very close due to which sometimes
>>> the results can vary. If that is the case, then probably using
>>> fuzz_factor in costs comparison (as is done in attached patch) can
>>> improve the situation, may be we have to consider some other factors
>>> like number of rows in each subpath.
>
>> This isn't an acceptable solution because sorting requires that the
>> comparison operator satisfy the transitive property; that is, if a = b
>> and b = c then a = c. With your proposed comparator, you could have a
>> = b and b = c but a < c. That will break stuff.
>
>> It seems like the obvious fix here is to use a query where the
>> contents of the partitions are such that the sorting always produces
>> the same result. We could do that either by changing the query or by
>> changing the data in the partitions or, maybe, by inserting ANALYZE
>> someplace.
>
> The foo_star tables are made in create_table.sql, filled in
> create_misc.sql, and not modified thereafter. The fact that we have
> accurate rowcounts for them in select_parallel.sql is because of the
> database-wide VACUUM that happens at the start of sanity_check.sql.
> Given the lack of any WHERE condition, the costs in this particular query
> depend only on the rowcount and physical table size, so inserting an
> ANALYZE shouldn't (and doesn't, for me) change anything. I would be
> concerned about side-effects on other queries anyway if we were to ANALYZE
> tables that have never been ANALYZEd in the regression tests before.
>

Fair point. This seems to indicate that wrong rowcounts is probably
not the reason for the failure. However, I think it might still be
good to use a different set of tables (probably create new tables with
appropriate data for these queries) and analyze them explicitly before
these queries rather than relying on the execution order of some
not-directly related tests.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-09 04:27:20 Re: BUG #14941: Vacuum crashes
Previous Message Robert Haas 2018-01-09 04:02:32 Re: Condition variable live lock