Re: Statistics Import and Export

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-02-29 23:17:14
Message-ID: CAOuzzgp4vbs5wOUFyE9Ts+kc7+BU-_5oPNs2dgQ5cHhJBUvmFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Thu, Feb 29, 2024 at 17:48 Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:

> Having looked through this thread and discussed a bit with Corey
>> off-line, the approach that Tom laid out up-thread seems like it would
>> make the most sense overall- that is, eliminate the JSON bits and the
>> SPI and instead export the stats data by running queries from the new
>> version of pg_dump/server (in the FDW case) against the old server
>> with the intelligence of how to transform the data into the format
>> needed for the current pg_dump/server to accept, through function calls
>> where the function calls generally map up to the rows/information being
>> updated- a call to update the information in pg_class for each relation
>> and then a call for each attribute to update the information in
>> pg_statistic.
>>
>
> Thanks for the excellent summary of our conversation, though I do add that
> we discussed a problem with per-attribute functions: each function would be
> acquiring locks on both the relation (so it doesn't go away) and
> pg_statistic, and that lock thrashing would add up. Whether that overhead
> is judged significant or not is up for discussion. If it is significant, it
> makes sense to package up all the attributes into one call, passing in an
> array of some new pg_statistic-esque special type....the very issue that
> sent me down the JSON path.
>
> I certainly see the flexibility in having a per-attribute functions, but
> am concerned about non-binary-upgrade situations where the attnums won't
> line up, and if we're passing them by name then the function has dig around
> looking for the right matching attnum, and that's overhead too. In the
> whole-table approach, we just iterate over the attributes that exist, and
> find the matching parameter row.
>

That’s certainly a fair point and my initial reaction (which could
certainly be wrong) is that it’s unlikely to be an issue- but also, if you
feel you could make it work with an array and passing all the attribute
info in with one call, which I suspect would be possible but just a bit
more complex to build, then sure, go for it. If it ends up being overly
unwieldy then perhaps the per-attribute call would be better and we could
perhaps acquire the lock before the function calls..? Doing a check to see
if we have already locked it would be cheaper than trying to acquire a new
lock, I’m fairly sure.

Also per our prior discussion- this makes sense to include in post-data
section, imv, and also because then we have the indexes we may wish to load
stats for, but further that also means it’ll be in the paralleliziable part
of the process, making me a bit less concerned overall about the individual
timing.

Thanks!

Stephen

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-02-29 23:44:32 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Michael Paquier 2024-02-29 23:17:04 Re: Infinite loop in XLogPageRead() on standby