| From: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org, jkatz(at)postgresql(dot)org, nathandbossart(at)gmail(dot)com |
| Subject: | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date: | 2026-03-14 11:40:52 |
| Message-ID: | CAPmGK17+FS99X7t8_T2pL-seSOP_dYoQi7jOhshf8LKFzJgrwg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Mar 14, 2026 at 12:36 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> On Fri, Mar 13, 2026 at 8:17 AM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>> On Fri, Mar 13, 2026 at 4:42 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>> >>
>> >> By the errors-to-warnings change, even when those libpq functions
>> >> fail, we fall back to the normal ANALYZE processing, but I don't think
>> >> that that is OK, because if those functions fail for some reasons
>> >> (network-related issues like network disconnection or memory-related
>> >> issues like out of memory), then libpq functions called later for the
>> >> normal processing would also be likely to fail for the same reasons,
>> >> causing the same failure again, which I don't think is good. To avoid
>> >> that, when libpq functions in fetch_relstats() and fetch_attstats()
>> >> fail, shouldn't we just throw an error, as before?
>>
>> > Right now the code doesn't differentiate on the kind of an error that was encountered. If it was a network error, then not only do we expect fallback to also fail, but simple fdw queries will fail as well. If, however it were a permission issue (no SELECT permission on pg_stats, for instance, or pg_stats doesn't exist because the remote database is not a regular Postgres like redshift or something) then I definitely want to fall back. Currently, the code makes no judgement on the details of the error, it just trusts that fallback-analyze will either succeed (because it was permissions related) or it will quickly encounter the same insurmountable effort, and one extra PQsendQuery isn't that much overhead.
>>
>> I'm not sure if that's really true; if the error is the one that
>> doesn't take a time to detect, the extra function call wouldn't
>> probably be a problem, but if the error is the one that takes a long
>> time to detect, the function call would make the situation worse.
>
> I'm struggling to think of what kind of error that would be that would take a long time to detect. Network timeouts could be one thing, but such a situation would affect normal query operations as well, not just analyzes. If you have a specific error in mind, please let me know.
I too was thinking of network timeouts (and in such a case I thought
that even if PQsendQuery was run successfully, the following
PQgetResult would have to block for a long time). And yes, that
situation can occur for normal queries, but in such a query, the extra
functions call is done only when aborting the remote transaction, in a
way libpq functions don't have to wait for a network timeout.
>> > If you think that inspecting the error that we get and matching against a list of errors that should skip the retry or skip the fallback, I'd be in favor of that. It's probably easier to start with a list of errorcodes that we feel are hopeless and should remain ERRORs not WARNINGs.
>>
>> That's an improvement, but I think that that's nice-to-have, not
>> must-have. As there's not much time left until the feature freeze,
>> I'd like to propose to make the first cut as simple as possible and
>> thus leave that for future work, if this is intended for v19. If no
>> objections from you (or anyone else), I'd like to update the patch as
>> I proposed.
>
> I'd like to get this into 19 as well, so I'm likely to agree to your proposal.
Ok, will update the patch.
>> Another thing I noticed is related to this comment in fetch_relstats():
>>
>> + /*
>> + * Before v14, a reltuples value of 0 was ambiguous: it could either mean
>> + * the relation is empty, or it could mean that it hadn't yet been
>> + * vacuumed or analyzed. (Newer versions use -1 for the latter case).
>> + *
>> + * We can ignore this change, because if the remote table wasn't analyzed,
>> + * then it would have no attribute stats, and thus we wouldn't have stats
>> + * that we would try to import. So we can take the reltuples value as-is.
>> + */
>>
>> I might have misread the patch, but for v14 or later the patch tries
>> to import remote stats even when we have reltuples = -1. Is that
>> intentional? In that case the remote table has never yet been
>> vacuumed/analyzed, so I think we should just fall back to the normal
>> processing.
>
> For cases where we affirmatively have -1, it makes sense to immediately go to the ANALYZE option (if available) and fall back if not. I'll make that change.
>
>> Also, for v13 or earlier it also tries to import remote
>> stats even when we have reltuples = 0, but in that case the remote
>> table might not have been vacuumed/analyzed, so to avoid possibly
>> useless processing, I think we should just fall back to it in that
>> case as well.
>
> +1
I forgot to mention this, but when we have reltuples = 0 for v14 or
later, the patch tries to import both relstats and attstats, but in
that case I think it would be OK to do so only for the former for the
consistency with the normal processing. What do you think about that?
Best regards,
Etsuro Fujita
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Lakhin | 2026-03-14 12:00:00 | Re: pg_plan_advice |
| Previous Message | Amit Kapila | 2026-03-14 10:01:42 | Re: Report bytes and transactions actually sent downtream |