Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-11 18:20:36
Message-ID: CADkLM=d3XhRSySPX+CHL7KmBCgt7ZpqL-0NLxdSWNYHx=MBjag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
>
> Having thought about it a bit more, I generally like the idea of being
> able to just update one stat instead of having to update all of them at
> once (and therefore having to go look up what the other values currently
> are...). That said, per below, perhaps making it strict is the better
> plan.
>

v8 has it as strict.

>
> > > Also, in some cases we allow the function to be called with a
> > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the
> > > OID ends up being NULL).
> >
> > Thoughts on it emitting a WARN or NOTICE before returning false?
>
> Eh, I don't think so?
>
> Where this is coming from is that we can often end up with functions
> like these being called inside of larger queries, and having them spit
> out WARN or NOTICE will just make them noisy.
>
> That leads to my general feeling of just returning NULL if called with a
> NULL OID, as we would get with setting the function strict.
>

In which case we're failing nearly silently, yes, there is a null returned,
but we have no idea why there is a null returned. If I were using this
function manually I'd want to know what I did wrong, what parameter I
skipped, etc.

> Well, that code is for pg_statistic while I was looking at pg_class (in
> vacuum.c:1428-1443, where we track if we're actually changing anything
> and only make the pg_class change if there's actually something
> different):
>

I can do that, especially since it's only 3 tuples of known types, but my
reservations are summed up in the next comment.

> Not sure why we don't treat both the same way though ... although it's
> probably the case that it's much less likely to have an entire
> pg_statistic row be identical than the few values in pg_class.
>

That would also involve comparing ANYARRAY values, yuk. Also, a matched
record will never be the case when used in primary purpose of the function
(upgrades), and not a big deal in the other future cases (if we use it in
ANALYZE on foreign tables instead of remote table samples, users
experimenting with tuning queries under hypothetical workloads).

> Hmm, that's a valid point, so a NULL passed in would need to set that
> value actually to NULL, presumably. Perhaps then we should have
> pg_set_relation_stats() be strict and have pg_set_attribute_stats()
> handles NULLs passed in appropriately, and return NULL if the relation
> itself or attname, or other required (not NULL'able) argument passed in
> cause the function to return NULL.
>

That's how I have relstats done in v8, and could make it do that for attr
stats.

(What I'm trying to drive at here is a consistent interface for these
> functions, but one which does a no-op instead of returning an ERROR on
> values being passed in which aren't allowable; it can be quite
> frustrating trying to get a query to work where one of the functions
> decides to return ERROR instead of just ignoring things passed in which
> aren't valid.)
>

I like the symmetry of a consistent interface, but we've already got an
asymmetry in that the pg_class update is done non-transactionally (like
ANALYZE does).

One persistent problem is that there is no _safe equivalent to ARRAY_IN, so
that can always fail on us, though it should only do so if the string
passed in wasn't a valid array input format, or the values in the array
can't coerce to the attribute's basetype.

I should also point out that we've lost the ability to check if the export
values were of a type, and if the destination column is also of that type.
That's a non-issue in binary upgrades, but of course if a field changed
from integers to text the histograms would now be highly misleading.
Thoughts on adding a typname parameter that the function uses as a cheap
validity check?

v8 attached, incorporating these suggestions plus those of Bharath and
Bertrand. Still no pg_dump.

As for pg_dump, I'm currently leading toward the TOC entry having either a
series of commands:

SELECT pg_set_relation_stats('foo.bar'::regclass, ...);
pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ...

Or one compound command

SELECT pg_set_relation_stats(t.oid, ...)
pg_set_attribute_stats(t.oid, 'id'::name, ...),
pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
...
FROM (VALUES('foo.bar'::regclass)) AS t(oid);

The second one has the feature that if any one attribute fails, then the
whole update fails, except, of course, for the in-place update of pg_class.
This avoids having an explicit transaction block, but we could get that
back by having restore wrap the list of commands in a transaction block
(and adding the explicit lock commands) when it is safe to do so.

Attachment Content-Type Size
v8-0001-Create-pg_set_relation_stats-pg_set_attribute_sta.patch text/x-patch 41.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-03-11 18:27:43 Re: UUID v7
Previous Message Michał Kłeczek 2024-03-11 17:58:54 Alternative SAOP support for GiST