Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-08 06:35:40
Message-ID: CADkLM=dB-qbU81LuAQVSQxAabYXghXe6ptaatTYmQAzVT44V_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
> Having some discussion around that would be useful. Is it better to
> have a situation where there are stats for some columns but no stats for
> other columns? There would be a good chance that this would lead to a
> set of queries that were properly planned out and a set which end up
> with unexpected and likely poor query plans due to lack of stats.
> Arguably that's better overall, but either way an ANALYZE needs to be
> done to address the lack of stats for those columns and then that
> ANALYZE is going to blow away whatever stats got loaded previously
> anyway and all we did with a partial stats load was maybe have a subset
> of queries have better plans in the interim, after having expended the
> cost to try and individually load the stats and dealing with the case of
> some of them succeeding and some failing.
>

It is my (incomplete and entirely second-hand) understanding is that
pg_upgrade doesn't STOP autovacuum, but sets a delay to a very long value
and then resets it on completion, presumably because analyzing a table
before its data is loaded and indexes are created would just be a waste of
time.

>
> Overall, I'd suggest we wait to see what Corey comes up with in terms of
> doing the stats load for all attributes in a single function call,
> perhaps using the VALUES construct as you suggested up-thread, and then
> we can contemplate if that's clean enough to work or if it's so grotty
> that the better plan would be to do per-attribute function calls. If it
> ends up being the latter, then we can revisit this discussion and try to
> answer some of the questions raised above.
>

In the patch below, I ended up doing per-attribute function calls, mostly
because it allowed me to avoid creating a custom data type for the portable
version of pg_statistic. This comes at the cost of a very high number of
parameters, but that's the breaks.

I am a bit concerned about the number of locks on pg_statistic and the
relation itself, doing CatalogOpenIndexes/CatalogCloseIndexes once per
attribute rather than once per relation. But I also see that this will
mostly get used at a time when no other traffic is on the machine, and
whatever it costs, it's still faster than the smallest table sample (insert
joke about "don't have to be faster than the bear" here).

This raises questions about whether a failure in one attribute update
statement should cause the others in that relation to roll back or not, and
I can see situations where both would be desirable.

I'm putting this out there ahead of the pg_dump / fe_utils work, mostly
because what I do there heavily depends on how this is received.

Also, I'm still seeking confirmation that I can create a pg_dump TOC entry
with a chain of commands (e.g. BEGIN; ... COMMIT; ) or if I have to fan
them out into multiple entries.

Anyway, here's v7. Eagerly awaiting feedback.

Attachment Content-Type Size
v7-0001-Create-pg_set_relation_stats-pg_set_attribute_sta.patch text/x-patch 40.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-03-08 07:03:00 Re: speed up a logical replica setup
Previous Message Michael Paquier 2024-03-08 06:32:22 Re: meson: Specify -Wformat as a common warning flag for extensions