Re: POC: converting Lists into arrays

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: converting Lists into arrays
Date: 2019-07-17 03:02:52
Message-ID: CAKJS1f9-p4R_b2__ugSpiJR3h_TRTw4pb+q8jk=yhEhnT5AMYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 17 Jul 2019 at 11:06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 0002 changes some additional places where it's maybe a bit less safe,
> ie there's a potential for user-visible behavioral change because
> processing will occur in a different order. In particular, the proposed
> change in execExpr.c causes aggregates and window functions that are in
> the same plan node to be executed in a different order than before ---
> but it seems to me that this order is saner. (Note the change in the
> expected regression results, in a test that's intentionally sensitive to
> the execution order.) And anyway when did we guarantee anything about
> that?

I've only looked at 0002. Here are my thoughts:

get_tables_to_cluster:
Looks fine. It's a heap scan. Any previous order was accidental, so if
it causes issues then we might need to think of using a more
well-defined order for CLUSTER;

get_rels_with_domain:
This is a static function. Changing the order of the list seems to
only really affect the error message that a failed domain constraint
validation could emit. Perhaps this might break someone else's tests,
but they should just be able to update their expected results.

ExecInitExprRec:
As you mention, the order of aggregate evaluation is reversed. I agree
that the new order is saner. I can't think why we'd be doing it in
backwards order beforehand.

get_relation_statistics:
RelationGetStatExtList does not seem to pay much attention to the
order it returns its results, so I don't think the order we apply
extended statistics was that well defined before. We always attempt to
use the stats with the most matching columns in
choose_best_statistics(), so I think
for people to be affected they'd either multiple stats with the same
sets of columns or a complex clause that equally well matches two sets
of stats, and in that case the other columns would be matched to the
other stats later... I'd better check that... erm... actually that's
not true. I see statext_mcv_clauselist_selectivity() makes no attempt
to match the clause list to another set of stats after finding the
first best match. I think it likely should do that.
estimate_multivariate_ndistinct() seems to have an XXX comment
mentioning thoughts about the stability of which stats are used, but
nothing is done.

parseCheckAggregates:
I can't see any user-visible change to this one. Not even in error messages.

estimate_num_groups:
Similar to get_relation_statistics(), I see that
estimate_multivariate_ndistinct() is only called once and we don't
attempt to match up the remaining clauses with more stats. I can't
imagine swapping lcons for lappend here will upset anyone. The
behaviour does not look well defined already. I think we should likely
change the "if (estimate_multivariate_ndistinct(root, rel,
&relvarinfos," to "while ...", then drop the else. Not for this patch
though...

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-17 03:04:32 Re: refactoring - share str2*int64 functions
Previous Message Thomas Munro 2019-07-17 03:01:47 Re: Adding SMGR discriminator to buffer tags