Re: Parallel append plan instability/randomness

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel append plan instability/randomness
Date: 2018-01-08 19:18:11
Message-ID: 12879.1515439091@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jan 7, 2018 at 11:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> One theory that can explain above failure is that the costs of
>> scanning some of the sub-paths is very close due to which sometimes
>> the results can vary. If that is the case, then probably using
>> fuzz_factor in costs comparison (as is done in attached patch) can
>> improve the situation, may be we have to consider some other factors
>> like number of rows in each subpath.

> This isn't an acceptable solution because sorting requires that the
> comparison operator satisfy the transitive property; that is, if a = b
> and b = c then a = c. With your proposed comparator, you could have a
> = b and b = c but a < c. That will break stuff.

> It seems like the obvious fix here is to use a query where the
> contents of the partitions are such that the sorting always produces
> the same result. We could do that either by changing the query or by
> changing the data in the partitions or, maybe, by inserting ANALYZE
> someplace.

The foo_star tables are made in create_table.sql, filled in
create_misc.sql, and not modified thereafter. The fact that we have
accurate rowcounts for them in select_parallel.sql is because of the
database-wide VACUUM that happens at the start of sanity_check.sql.
Given the lack of any WHERE condition, the costs in this particular query
depend only on the rowcount and physical table size, so inserting an
ANALYZE shouldn't (and doesn't, for me) change anything. I would be
concerned about side-effects on other queries anyway if we were to ANALYZE
tables that have never been ANALYZEd in the regression tests before.

I remain pretty much at a loss for exactly what happened on silverfish,
but that doesn't mean I have no ideas for improving things. Both
append_total_cost_compare and append_startup_cost_compare are seriously
deficient IMO, because they ignore the question of what to do when total
costs (resp. startup costs) are exactly equal, leaving the outcome to
the none too tender mercies of qsort(). Which you'll recall is not a
stable algorithm.

(For fewer than 7 items, our implementation of qsort falls back to an
insertion sort, which *is* stable, perhaps explaining why we've not
seen failures in this test case many times before. So this line of
thought doesn't explain what happened on silverfish. But there's
surely room for improvement here.)

The normal rule in the planner, as embodied in places like
compare_path_costs, is to break ties on total cost by comparing startup
cost, and vice versa, which these two functions fail to do; so my first
point is that they ought to be using compare_path_costs rather than
inventing their own approach. This would especially affect
append_startup_cost_compare, because of the frequency with which plans
have identical startup costs (ie 0). Right now I'm afraid we're getting
basically random ordering of the partial subpaths in most cases.

In the regression test case at hand, the startup costs are all zero so
this change wouldn't improve the test case's stability. So I'm thinking
that in addition, it would be a good idea for these functions to break
exact compare_path_costs ties in some arbitrary but deterministic way,
rather than letting qsort() have the final say on what happens. If the
subplans were all simple relation scans we could order them by planner
relid, but I'm not sure what to do if they're not.

BTW, I had not looked closely at list_qsort before, but now that I have
it seems seriously bizarre. Why is it allocating a new List header
rather than reusing the old one? Because it stomps on the next-links
of the list cells, the old header is effectively corrupted and can't
possibly be useful afterwards, so we might as well re-use it and save
a palloc cycle. (By the same token, it's pretty ludicrous that it
claims the input List is "const". Not stomping on the list header
is a pretty poor definition of not modifying the list.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-08 19:24:42 Re: PL/Python SD dict wiped?
Previous Message Alexander Korotkov 2018-01-08 19:17:55 Re: [HACKERS] [PATCH] Incremental sort