Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2026-01-16 07:10:50
Message-ID: CAExHW5sXi9gEQkB_7bk+3R6mnEm3oYO2vRVOS9_ZSY13+g1Bzg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 15, 2026 at 11:27 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>
>>
>> Or else suddenly all ANALYZE on foreign
>> server will be costly since they are running a couple queries before
>> falling back to sampling.
>
>
> In all scenarios, the remote_analyze setting defaults to false, and the remote database should continue to have stats going forward from that point. I can see value in having a more insistent NOTICE/WARNING message for situations where a remote ANALYZE fails or the table continues to have inadequate stats after the remote ANALYZE.
>
>>
>> Why do we need remote_analyze as a flag? I might have missed some
>> conversation which led to including this flag. We fetch relkind, so we
>> know whether the object on the foreign server supports statistics or
>> not.
>
>
> Well, sorta. We know the database on the other end talks libpq and has the table pg_class. There are more than a few forks, Redshift and Vertica come to mind, where this check will succeed, but the design diverges after that and doesn't have pg_statistic or similar. We have to do deeper checks of the remote system (matching pg_statistic rows to all the va_cols we expect, etc) to verify that we _probably_ have good stats, but we don't actually truly know if the stats will actually be there until we fetch and import them, and at that point we're done. And that is part of my discomfort with StatisticsAreImportable in that it can identify some of the cases where we _can't_ import stats, but to identify whether we _can_ import the stats we have just moved all of the functionality of ImportStatistics into StatisticsAreImportable.
>
>>
>> If it supports statistics, we run ANALYZE automatically. I mean
>> the user wants us to ANALYZE the foreign table and has also indicated
>> that the remote object is capable of storing stats. We should document
>> the fact that we analyze if we do not find existing statistics. Why
>> would users want to disable running ANALYZE? One possibility is they
>> don't want to disturb query plans because of suddenly changing
>> statistics. But then they will need to switch auto-analysis off. So
>> maybe there is a way to know when it's not acceptable to run ANALYZE
>> on the remote server. I feel this flag will put another burden on the
>> user who already has to set so many flags in postgres_fdw. That is
>> especially true, when the flag is useful only the first time when the
>> remote object doesn't have stats - usually there will be stats. If at
>> all we have to introduce remote_analyze, let it default to the value
>> of fetch_stats.
>
>
> I may have answered this question in the above responses, but remote_analyze does not default to ON, nor do I envision making it the default. I think the primary purpose for the flag is situations where the remote database is often "swapped out" with one that has a fresher set of the data (via DNS swapping, port swapping, etc) and the new database was put online before vacuum got around to those tables. I would agree that running the remote analyze is pulling a big lever, hence only trying it if we failed to find sufficient existing stats.
>

Assuming following conditions to be true
1. object on the other side usually has statistics
2. it didn't when we queried.

The reason for that situation is that the object was not analyzed
before for the reasons you mention. Then why not just run ANALYZE and
instantiate the statistics. That will happen only rarely. Why do we
need a table and server option to control that behaviour? Maybe you
have already explained and I am not able to understand your answer.
Sorry.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-01-16 07:18:53 Re: [PATCH] Fix incorrect parser comment
Previous Message Chao Li 2026-01-16 06:54:40 Re: fix a spelling mistake