Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2025-11-27 12:46:09
Message-ID: CAPmGK14EkE-LaGqpu1aMfc_bOZ54zubk2Wbx1TEm6TFYiOu26Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 22, 2025 at 6:31 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>> Other initial comments:
>>
>> The commit message says:
>>
>> This is managed via two new options, fetch_stats and remote_analyze,
>> both are available at the server level and table level. If fetch_stats
>> is true, then the ANALYZE command will first attempt to fetch statistics
>> from the remote table and import those statistics locally.
>>
>> If remote_analyze is true, and if the first attempt to fetch remote
>> statistics found no attribute statistics, then an attempt will be made
>> to ANALYZE the remote table before a second and final attempt to fetch
>> remote statistics.
>>
>> If no statistics are found, then ANALYZE will fall back to the normal
>> behavior of sampling and local analysis.
>>
>> I think the first step assumes that the remote stats are up-to-date;
>> if they aren't, it would cause a regression. (If the remote relation
>> is a plain table, they are likely to be up-to-date, but for example,
>> if it is a foreign table, it's possible that they are stale.) So how
>> about making it the user's responsibility to make them up-to-date? If
>> doing so, we wouldn't need to do the second and third steps anymore,
>> making the patch simple.

> Obviously there is no way to know the quality/freshness of remote stats if they are found.
>
> The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.

Actually, I have some concerns about the ANALYZE and fall-back
options. As for the former, if the remote user didn't have the
MAINTAIN privilege on the remote table, remote ANALYZE would be just a
waste effort. As for the latter, imagine the situation where a user
ANALYZEs a foreign table whose remote table is significantly large.
When the previous attempts fail, the user might want to re-try to
import remote stats after ANALYZEing the remote table in the remote
side in some way, rather than postgres_fdw automatically falling back
to the normal lengthy processing. I think just throwing an error if
the first attempt fails would make the system not only simple but
reliable while providing some flexibility to users.

>> On the other hand:
>>
>> This operation will only work on remote relations that can have stored
>> statistics: tables, partitioned tables, and materialized views. If the
>> remote relation is a view then remote fetching/analyzing is just wasted
>> effort and the user is better of setting fetch_stats to false for that
>> table.
>>
>> I'm not sure the waste effort is acceptable; IMO, if the remote table
>> is a view, I think that the system should detect that in some way, and
>> then just do the normal ANALYZE processing.
>
>
> The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisions on whether to continue from there, only applying the relstats after attrstats have been successfully applied.

Good idea! I would vote for throwing an error if the relkind is view,
making the user set fetch_stats to false for the foreign table.

Best regards,
Etsuro Fujita

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2025-11-27 12:48:34 Re: Import Statistics in postgres_fdw before resorting to sampling.
Previous Message Aleksander Alekseev 2025-11-27 12:45:15 Re: [PATCH] Refactor bytea_sortsupport(), take two