Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2025-12-12 13:45:46
Message-ID: CAPmGK16ZVmxuHGn+rvvbsvat8osDWkcdNeF5dP61eKiC9D7Pig@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 27, 2025 at 9:46 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Sat, Nov 22, 2025 at 6:31 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> >> Other initial comments:
> >>
> >> The commit message says:
> >>
> >> This is managed via two new options, fetch_stats and remote_analyze,
> >> both are available at the server level and table level. If fetch_stats
> >> is true, then the ANALYZE command will first attempt to fetch statistics
> >> from the remote table and import those statistics locally.
> >>
> >> If remote_analyze is true, and if the first attempt to fetch remote
> >> statistics found no attribute statistics, then an attempt will be made
> >> to ANALYZE the remote table before a second and final attempt to fetch
> >> remote statistics.
> >>
> >> If no statistics are found, then ANALYZE will fall back to the normal
> >> behavior of sampling and local analysis.
> >>
> >> I think the first step assumes that the remote stats are up-to-date;
> >> if they aren't, it would cause a regression. (If the remote relation
> >> is a plain table, they are likely to be up-to-date, but for example,
> >> if it is a foreign table, it's possible that they are stale.) So how
> >> about making it the user's responsibility to make them up-to-date? If
> >> doing so, we wouldn't need to do the second and third steps anymore,
> >> making the patch simple.
>
> > Obviously there is no way to know the quality/freshness of remote stats if they are found.
> >
> > The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.
>
> Actually, I have some concerns about the ANALYZE and fall-back
> options. As for the former, if the remote user didn't have the
> MAINTAIN privilege on the remote table, remote ANALYZE would be just a
> waste effort. As for the latter, imagine the situation where a user
> ANALYZEs a foreign table whose remote table is significantly large.
> When the previous attempts fail, the user might want to re-try to
> import remote stats after ANALYZEing the remote table in the remote
> side in some way, rather than postgres_fdw automatically falling back
> to the normal lengthy processing. I think just throwing an error if
> the first attempt fails would make the system not only simple but
> reliable while providing some flexibility to users.

I think I was narrow-minded here; as for the ANALYZE option, if the
remote user has the privilege, it would work well, so +1 for it, but I
don't think it's a must-have option, so I'd vote for making it a
separate patch. As for the fall-back behavior, however, sorry, I
still think it reduces flexibility.

Best regards,
Etsuro Fujita

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-12-12 13:52:33 Re: Add WALRCV_CONNECTING state to walreceiver
Previous Message tushar 2025-12-12 13:40:02 Re: Non-text mode for pg_dumpall