From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
Date: | 2025-08-14 18:34:43 |
Message-ID: | CADkLM=feCJrEDWCOpoxxg58J_6zX6nv+BnXufUOEZuP7bYTCbg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
>
> This isn't a full review. I looked at the patches mainly to find out
> how does it fit into the current method of analysing a foreign table.
>
Any degree of review is welcome. We're chasing views, reviews, etc.
> Right now, do_analyze_rel() is called with FDW specific acquireFunc,
> which collects a sample of rows. The sample is passed to attribute
> specific compute_stats which fills VacAttrStats for that attribute.
> VacAttrStats for all the attributes is passed to update_attstats(),
> which updates pg_statistics. The patch changes that to fetch the
> statistics from the foreign server and call pg_restore_attribute_stats
> for each attribute.
That recap is accurate.
> Instead I was expecting that after fetching the
> stats from the foreign server, it would construct VacAttrStats and
> call update_attstats(). That might be marginally faster since it
> avoids SPI call and updates stats for all the attributes. Did you
> consider this alternate approach and why it was discarded?
>
It might be marginally faster, but it would duplicate a lot of the
pair-checking (must have a most-common-freqs with a most-common-vals, etc)
and type-checking logic (the vals in a most-common vals must all input
coerce to the correct datatype for the _destination_ column, etc), and
we've already got that in pg_restore_attribute_stats. There used to be a
non-fcinfo function that took a long list of Datums and isnull boolean
pairs, but that wasn't pretty to look at and was replaced with the
positional fcinfo version we have today. This use case might be a reason to
bring that back, or expose the existing positional fcinfo function
(presently static) if we want to avoid SPI badly enough. As it is, the SPI
code is fairly future proof in that it isn't required to add new stat types
as they are created. My first attempt at this patch attempted to make a
FunctionCallInvoke() on the variadic pg_restore_attribute_stats, but that
would require a filled out fn_expr, and to get that we'd have to duplicate
a lot of logic from the parser, so the infrastructure isn't available to
easily call a variadic function.
>
> If a foreign table points to an inheritance parent on the foreign
> server, we will receive two rows for that table - one with inherited =
> false and other with true in that order. I think the stats with
> inherited=true are relevant to the local server since querying the
> parent will fetch rows from children. Since that stats is applied
> last, the pg_statistics will retain the intended statistics. But why
> to fetch two rows in the first place and waste computing cycles?
>
Glad you agree that we're fetching the right statistics.
That was the only way I could think of to do one client fetch and still get
exactly one row back.
Anything else involved fetching the inh=true first, and if that failed
fetching the inh=false statistics. That adds extra round-trips especially
given that inherited statistics are more rare than non-inherited
statistics. Moreoever, we're making decisions (analyze yes/no, fallback to
sampling yes/no) based on whether or not we got statistics back from the
foreign server at all, and having to consider the result of two fetches
instead of one makes that logic more complicated.
If, however, you were referring to the work we're handing the remote
server, I'm open to queries that you think would be more lightweight.
However, the pg_stats view is a security barrier view, so we reduce
overhead by passing through that barrier as few times as possible.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-08-14 18:44:44 | Re: index prefetching |
Previous Message | Andrew Dunstan | 2025-08-14 18:29:53 | Re: Retail DDL |