Re: Import Statistics in postgres_fdw before resorting to sampling.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com
Subject: Re: Import Statistics in postgres_fdw before resorting to sampling.
Date: 2025-12-12 19:45:42
Message-ID: CADkLM=fb=FsBLqNc14-y4na57ypywZxYb4q-zrxcaY6qQW0B2Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

CCing Jonathan Katz and Nathan Bossart as they had been sounding boards for
me when I started designing this feature.

> Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the
> foreign table is an inheritance parent, we would fail to do inherited
> stats.
>

Perhaps I'm not understanding completely, but I believe what we're doing
now should be ok.

The local table of type 'f' can be a member of a partition, but can't be a
partition itself, so whatever stats we get for it, we're storing them as
inh = false.

On the remote side, the table could be an inheritance parent, in which case
we ONLY want the inh=true stats, but we will still store them locally as
inh = false.

The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of
the query ensures that we get inh=true stats if they're there in preference
to the inh=false steps.

I will grant you that in an old-style inheritance (i.e. not formal
partitions) situation we'd probably want some weighted mix of the inherited
and non-inherited stats, but 1) very few people use old-style inheritance
anymore, 2) few of those export tables via a FDW, and 3) there's no way to
do that weighting so we should fall back to sampling anyway.

None of this takes away from your suggestions down below.

> IIUC, the FDW needs to do two things within the ImportStatistics
> callback function: check the importability, and if ok, do the work. I
> think that would make the API complicated. To avoid that, how about
> 1) splitting the callback function into the two functions shown below
> and then 2) rewriting the above to something like the attached? The
> attached addresses the returning issue mentioned above as well.
>
> bool
> StatisticsAreImportable(Relation relation)
>
> Checks the importability, and if ok, returns true, in which case the
> following callback function is called. In the postgres_fdw case, we
> could implement this to check if the fetch_stats flag for the foreign
> table is set to true, and if so, return true.
>

+1

> void
> ImportStatistics(Relation relation, List *va_cols, int elevel)
>
> Imports the stats for the foreign table from the foreign server. As
> mentioned in previous emails, I don't think it's a good idea to fall
> back to the normal processing when the attempt to import the stats
> fails, in which case I think we should just throw an error (or
> warning) so that the user can re-try to import the stats after fixing
> the foreign side in some way. So I re-defined this as a void
> function. Note that this re-definition removes the concern mentioned
> in the comment starting with "XXX:". In the postgres_fdw case, as
> mentioned in a previous email, I think it would be good to implement
> this so that it checks whether the remote table is a view or not when
> importing the relation stats from the remote server, and if so, just
> throws an error (or warning), making the user reset the fetch_stats
> flag.
>

I think I'm ok with this design as the decision, as it still leaves open
the fdw-specific options of how to handle initially finding no remote stats.

I can still see a situation where a local table expects the remote table to
eventually have proper statistics on it, but until that happens it will
fall back to table samples. This design decision means that either the user
lives without any statistics for a while, or alters the foreign table
options and hopefully remembers to set them back. While I understand the
desire to first implement something very simple, I think that adding the
durability that fallback allows for will be harder to implement if we don't
build it in from the start. I'm interested to hear with Nathan and/or
Jonathan have to say about that.

> I added two arguments to the callback function: va_cols, for
> supporting the column list in the ANALYZE command, and elevel, for
> proper logging. I'm not sure if VaccumParams should be added as well.
>

Good catch, I forgot about that one.

Going to think some more on this before I work incorporating your

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-12-12 19:49:27 Re: Solaris versus our NLS files
Previous Message Nathan Bossart 2025-12-12 19:41:30 Re: Proposal: Add a callback data parameter to GetNamedDSMSegment