Re: WITH clause in CREATE STATISTICS

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: WITH clause in CREATE STATISTICS
Date: 2017-05-12 11:13:27
Message-ID: b6b1f3a1-281a-20bc-d961-60604046c73d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/12/17 4:46 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> Hmm, yeah, makes sense. Here's a patch for this approach.
>
> ...
>
> Also, while reading the docs changes, I got rather ill from the
> inconsistent and not very grammatical handling of "statistics" as a
> noun, particularly the inability to decide from one sentence to the next
> if that is singular or plural. In the attached, I fixed this in the
> ref/*_statistics.sgml files by always saying "statistics object" instead.
> If people agree that that reads better, we should make an effort to
> propagate that terminology elsewhere in the docs, and into error messages,
> objectaddress.c output, etc.
>

I'm fine with the 'statistics object' wording. I've been struggling with
this bit while working on the patch, and I agree it sounds better and
getting it consistent is definitely worthwhile.

>
> Although I've not done anything about it here, I'm not happy about the
> handling of dependencies for stats objects. I do not think that cloning
> RemoveStatistics as RemoveStatisticsExt is sane at all. The former is
> meant to deal with cleaning up pg_statistic rows that we know will be
> there, so there's no real need to expend pg_depend overhead to track them.
> For objects that are only loosely connected, the right thing is to use
> the dependency system; in particular, this design has no real chance of
> working well with cross-table stats. Also, it's really silly to have
> *both* this hard-wired mechanism and a pg_depend entry; the latter is
> surely redundant if you have the former. IMO we should revert
> RemoveStatisticsExt and instead handle things by making stats objects
> auto-dependent on the individual column(s) they reference (not the whole
> table).
>
> I'm also of the opinion that having an AUTO dependency, rather than
> a NORMAL dependency, on the stats object's schema is the wrong semantics.
> There isn't any other case where you can drop a non-empty schema without
> saying CASCADE, and I'm mystified why this case should act that way.
>

Yeah, it's a bit frankensteinian ...

>
> Lastly, I tried the example given in the CREATE STATISTICS reference page,
> and it doesn't seem to work. Without the stats object, the two queries
> are both estimated at one matching row, whereas they really produce 100
> and 0 rows respectively. With the stats object, they seem to both get
> estimated at ~100 rows, which is a considerable improvement for one case
> but a considerable disimprovement for the other. If this is not broken,
> I'd like to know why not. What's the point of an expensive extended-
> stats mechanism if it can't get this right?
>

I assume you're talking about the functional dependencies and in that
case that's expected behavior, because f. dependencies do assume the
conditions are "consistent" with the functional dependencies.

This is an inherent limitation of functional dependencies, and perhaps a
price for the simplicity of this statistics type. If your application
executes queries that are likely not consistent with this, don't use
functional dependencies.

The simplicity is why dependencies were implemented first, mostly to
introduce all the infrastructure. The other statistics types (MCV,
histograms) don't have this limitation, but those did not make it into pg10.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2017-05-12 12:00:17 Re: Adding support for Default partition in partitioning
Previous Message Alexander Kuzmenkov 2017-05-12 11:09:32 PoC: full merge join on comparison clause