Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2025-11-21 21:31:23
Message-ID: CADkLM=foU7yK1rfMzqTPiQtfEOGANAS9dHE5-Js6hEXmZiKfow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Other initial comments:
>
> The commit message says:
>
> This is managed via two new options, fetch_stats and remote_analyze,
> both are available at the server level and table level. If fetch_stats
> is true, then the ANALYZE command will first attempt to fetch
> statistics
> from the remote table and import those statistics locally.
>
> If remote_analyze is true, and if the first attempt to fetch remote
> statistics found no attribute statistics, then an attempt will be made
> to ANALYZE the remote table before a second and final attempt to fetch
> remote statistics.
>
> If no statistics are found, then ANALYZE will fall back to the normal
> behavior of sampling and local analysis.
>
> I think the first step assumes that the remote stats are up-to-date;
> if they aren't, it would cause a regression. (If the remote relation
> is a plain table, they are likely to be up-to-date, but for example,
> if it is a foreign table, it's possible that they are stale.) So how
> about making it the user's responsibility to make them up-to-date? If
> doing so, we wouldn't need to do the second and third steps anymore,
> making the patch simple.
>

Obviously there is no way to know the quality/freshness of remote stats if
they are found.

The analyze option was borne of feedback from other postgres hackers while
brainstorming on what this option might look like. I don't think we *need*
this extra option for the feature to be a success, but it's relative
simplicity did make me want to put it out there to see who else liked it.

>
> On the other hand:
>
> This operation will only work on remote relations that can have stored
> statistics: tables, partitioned tables, and materialized views. If the
> remote relation is a view then remote fetching/analyzing is just wasted
> effort and the user is better of setting fetch_stats to false for that
> table.
>
> I'm not sure the waste effort is acceptable; IMO, if the remote table
> is a view, I think that the system should detect that in some way, and
> then just do the normal ANALYZE processing.
>

The stats fetch query is pretty light, but I can see fetching the relkind
along with the relstats, and making decisions on whether to continue from
there, only applying the relstats after attrstats have been successfully
applied.

> That's it for now.
>

I'll see what I can do to make that work.

> My apologies for the delayed response.
>

Valuable responses are worth waiting for.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-11-21 21:39:12 Re: [PATCH] Reorganize pqcomm.h a bit
Previous Message Bruce Momjian 2025-11-21 20:47:46 Re: should we have a fast-path planning for OLTP starjoins?