Re: WITH clause in CREATE STATISTICS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: WITH clause in CREATE STATISTICS
Date: 2017-05-12 02:46:45
Message-ID: 22676.1494557205@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Hmm, yeah, makes sense. Here's a patch for this approach.

I did some code review on this patch. The attached update makes the
following hopefully-uncontroversial changes:

* fix inconsistencies in the set of fields in CreateStatsStmt

* get rid of struct CreateStatsArgument, which seems to be a leftover

* tighten up parse checks in CreateStatistics(), and improve comments

* add missing check for ownership of target table; the documentation
says this is checked, and surely it is a security bug that it's not

* adjust regression test to compensate for the above, because it
fails otherwise

* change end of regression test to suppress cascade drop messages;
it's astonishing that this test hasn't caused intermittent buildfarm
failures due to unstable drop order

I did not review the rest of the regression test changes, nor the
tab-complete changes. I can however report that DROP STATISTICS <tab>
doesn't successfully complete any stats object names.

Also, while reading the docs changes, I got rather ill from the
inconsistent and not very grammatical handling of "statistics" as a
noun, particularly the inability to decide from one sentence to the next
if that is singular or plural. In the attached, I fixed this in the
ref/*_statistics.sgml files by always saying "statistics object" instead.
If people agree that that reads better, we should make an effort to
propagate that terminology elsewhere in the docs, and into error messages,
objectaddress.c output, etc.

Although I've not done anything about it here, I'm not happy about the
handling of dependencies for stats objects. I do not think that cloning
RemoveStatistics as RemoveStatisticsExt is sane at all. The former is
meant to deal with cleaning up pg_statistic rows that we know will be
there, so there's no real need to expend pg_depend overhead to track them.
For objects that are only loosely connected, the right thing is to use
the dependency system; in particular, this design has no real chance of
working well with cross-table stats. Also, it's really silly to have
*both* this hard-wired mechanism and a pg_depend entry; the latter is
surely redundant if you have the former. IMO we should revert
RemoveStatisticsExt and instead handle things by making stats objects
auto-dependent on the individual column(s) they reference (not the whole
table).

I'm also of the opinion that having an AUTO dependency, rather than
a NORMAL dependency, on the stats object's schema is the wrong semantics.
There isn't any other case where you can drop a non-empty schema without
saying CASCADE, and I'm mystified why this case should act that way.

Lastly, I tried the example given in the CREATE STATISTICS reference page,
and it doesn't seem to work. Without the stats object, the two queries
are both estimated at one matching row, whereas they really produce 100
and 0 rows respectively. With the stats object, they seem to both get
estimated at ~100 rows, which is a considerable improvement for one case
but a considerable disimprovement for the other. If this is not broken,
I'd like to know why not. What's the point of an expensive extended-
stats mechanism if it can't get this right?

regards, tom lane

Attachment Content-Type Size
createstats-with-4.patch text/x-diff 70.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-05-12 03:00:10 Re: UPDATE of partition key
Previous Message Masahiko Sawada 2017-05-12 02:44:19 Re: Race conditions with WAL sender PID lookups