| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2025-12-13 19:12:16 |
| Message-ID: | CADkLM=dfuT8YgXOyoXW7nuGdOFw-sYh2uR3hQS-N66Gj_k_ytw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 12, 2025 at 2:45 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:
> CCing Jonathan Katz and Nathan Bossart as they had been sounding boards
> for me when I started designing this feature.
>
>
>> Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the
>> foreign table is an inheritance parent, we would fail to do inherited
>> stats.
>>
>
> Perhaps I'm not understanding completely, but I believe what we're doing
> now should be ok.
>
> The local table of type 'f' can be a member of a partition, but can't be a
> partition itself, so whatever stats we get for it, we're storing them as
> inh = false.
>
> On the remote side, the table could be an inheritance parent, in which
> case we ONLY want the inh=true stats, but we will still store them locally
> as inh = false.
>
> The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of
> the query ensures that we get inh=true stats if they're there in preference
> to the inh=false steps.
>
> I will grant you that in an old-style inheritance (i.e. not formal
> partitions) situation we'd probably want some weighted mix of the inherited
> and non-inherited stats, but 1) very few people use old-style inheritance
> anymore, 2) few of those export tables via a FDW, and 3) there's no way to
> do that weighting so we should fall back to sampling anyway.
>
> None of this takes away from your suggestions down below.
>
>
>> IIUC, the FDW needs to do two things within the ImportStatistics
>> callback function: check the importability, and if ok, do the work. I
>> think that would make the API complicated. To avoid that, how about
>> 1) splitting the callback function into the two functions shown below
>> and then 2) rewriting the above to something like the attached? The
>> attached addresses the returning issue mentioned above as well.
>>
>> bool
>> StatisticsAreImportable(Relation relation)
>>
>> Checks the importability, and if ok, returns true, in which case the
>> following callback function is called. In the postgres_fdw case, we
>> could implement this to check if the fetch_stats flag for the foreign
>> table is set to true, and if so, return true.
>>
>
> +1
>
>
>> void
>> ImportStatistics(Relation relation, List *va_cols, int elevel)
>>
>> Imports the stats for the foreign table from the foreign server. As
>> mentioned in previous emails, I don't think it's a good idea to fall
>> back to the normal processing when the attempt to import the stats
>> fails, in which case I think we should just throw an error (or
>> warning) so that the user can re-try to import the stats after fixing
>> the foreign side in some way. So I re-defined this as a void
>> function. Note that this re-definition removes the concern mentioned
>> in the comment starting with "XXX:". In the postgres_fdw case, as
>> mentioned in a previous email, I think it would be good to implement
>> this so that it checks whether the remote table is a view or not when
>> importing the relation stats from the remote server, and if so, just
>> throws an error (or warning), making the user reset the fetch_stats
>> flag.
>>
>
> I think I'm ok with this design as the decision, as it still leaves open
> the fdw-specific options of how to handle initially finding no remote stats.
>
> I can still see a situation where a local table expects the remote table
> to eventually have proper statistics on it, but until that happens it will
> fall back to table samples. This design decision means that either the user
> lives without any statistics for a while, or alters the foreign table
> options and hopefully remembers to set them back. While I understand the
> desire to first implement something very simple, I think that adding the
> durability that fallback allows for will be harder to implement if we don't
> build it in from the start. I'm interested to hear with Nathan and/or
> Jonathan have to say about that.
>
>
>> I added two arguments to the callback function: va_cols, for
>> supporting the column list in the ANALYZE command, and elevel, for
>> proper logging. I'm not sure if VaccumParams should be added as well.
>>
>
> Good catch, I forgot about that one.
>
> Going to think some more on this before I work incorporating your
>
Heh, the word "changes" got cut off.
Anyway, here's v6 incorporating both threads of feedback.
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-Add-remote-statistics-fetching-to-postgres_fdw.patch | text/x-patch | 38.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2025-12-13 19:23:01 | Re: Type assertions without GCC builtins |
| Previous Message | Mihail Nikalayeu | 2025-12-13 19:01:52 | Re: Adding REPACK [concurrently] |