Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-11 20:08:05
Message-ID: CADkLM=eo4rmN8fFnxNycQtzTnsKwBxdXbAJMTvMC_A1_7HzRdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > In which case we're failing nearly silently, yes, there is a null
> returned,
> > but we have no idea why there is a null returned. If I were using this
> > function manually I'd want to know what I did wrong, what parameter I
> > skipped, etc.
>
> I can see it both ways and don't feel super strongly about it ... I just
> know that I've had some cases where we returned an ERROR or otherwise
> were a bit noisy on NULL values getting passed into a function and it
> was much more on the annoying side than on the helpful side; to the
> point where we've gone back and pulled out ereport(ERROR) calls from
> functions before because they were causing issues in otherwise pretty
> reasonable queries (consider things like functions getting pushed down
> to below WHERE clauses and such...).
>

I don't have strong feelings either. I think we should get more input on
this. Regardless, it's easy to change...for now.

>
> Sure. Not a huge deal either way, was just pointing out the difference.
> I do think it'd be good to match what ANALYZE does here, so checking if
> the values in pg_class are different and only updating if they are,
> while keeping the code for pg_statistic where it'll just always update.
>

I agree that mirroring ANALYZE wherever possible is the ideal.

> > I like the symmetry of a consistent interface, but we've already got an
> > asymmetry in that the pg_class update is done non-transactionally (like
> > ANALYZE does).
>
> Don't know that I really consider that to be the same kind of thing when
> it comes to talking about the interface as the other aspects we're
> discussing ...
>

Fair.

>
> > One persistent problem is that there is no _safe equivalent to ARRAY_IN,
> so
> > that can always fail on us, though it should only do so if the string
> > passed in wasn't a valid array input format, or the values in the array
> > can't coerce to the attribute's basetype.
>
> That would happen before we even get to being called and there's not
> much to do about it anyway.
>

Not sure I follow you here. the ARRAY_IN function calls happen once for
every non-null stavaluesN parameter, and it's done inside the function
because the result type could be the base type for a domain/array type, or
could be the type itself. I suppose we could move that determination to the
caller, but then we'd need to call get_base_element_type() inside a client,
and that seems wrong if it's even possible.

> > I should also point out that we've lost the ability to check if the
> export
> > values were of a type, and if the destination column is also of that
> type.
> > That's a non-issue in binary upgrades, but of course if a field changed
> > from integers to text the histograms would now be highly misleading.
> > Thoughts on adding a typname parameter that the function uses as a cheap
> > validity check?
>
> Seems reasonable to me.
>

I'd like to hear what Tomas thinks about this, as he was the initial
advocate for it.

> > As for pg_dump, I'm currently leading toward the TOC entry having either
> a
> > series of commands:
> >
> > SELECT pg_set_relation_stats('foo.bar'::regclass, ...);
> > pg_set_attribute_stats('foo.bar'::regclass, 'id'::name, ...); ...
>
> I'm guessing the above was intended to be SELECT ..; SELECT ..;
>

Yes.

>
> > Or one compound command
> >
> > SELECT pg_set_relation_stats(t.oid, ...)
> > pg_set_attribute_stats(t.oid, 'id'::name, ...),
> > pg_set_attribute_stats(t.oid, 'last_name'::name, ...),
> > ...
> > FROM (VALUES('foo.bar'::regclass)) AS t(oid);
> >
> > The second one has the feature that if any one attribute fails, then the
> > whole update fails, except, of course, for the in-place update of
> pg_class.
> > This avoids having an explicit transaction block, but we could get that
> > back by having restore wrap the list of commands in a transaction block
> > (and adding the explicit lock commands) when it is safe to do so.
>
> Hm, I like this approach as it should essentially give us the
> transaction block we had been talking about wanting but without needing
> to explicitly do a begin/commit, which would add in some annoying
> complications. This would hopefully also reduce the locking concern
> mentioned previously, since we'd get the lock needed in the first
> function call and then the others would be able to just see that we've
> already got the lock pretty quickly.
>

True, we'd get the lock needed in the first function call, but wouldn't we
also release that very lock before the subsequent call? Obviously we'd be
shrinking the window in which another process could get in line and take a
superior lock, and the universe of other processes that would even want a
lock that blocks us is nil in the case of an upgrade, identical to existing
behavior in the case of an FDW ANALYZE, and perfectly fine in the case of
someone tinkering with stats.

>
> > Subject: [PATCH v8] Create pg_set_relation_stats, pg_set_attribute_stats.
>
> [...]
>
> > +Datum
> > +pg_set_relation_stats(PG_FUNCTION_ARGS)
>
> [...]
>
> > + ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid));
> > + if (!HeapTupleIsValid(ctup))
> > + elog(ERROR, "pg_class entry for relid %u vanished during
> statistics import",
> > + relid);
>
> Maybe drop the 'during statistics import' part of this message? Also
> wonder if maybe we should make it a regular ereport() instead, since it
> might be possible for a user to end up seeing this?
>

Agreed and agreed. It was copypasta from ANALYZE.

>
> This comment doesn't seem quite right. Maybe it would be better if it
> was in the positive, eg: Only update pg_class if there is a meaningful
> change.
>

+1

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-03-11 20:37:49 Reports on obsolete Postgres versions
Previous Message Shruthi Gowda 2024-03-11 19:07:34 Re: remaining sql/json patches