| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "David E(dot) Wheeler" <david(at)justatheory(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Reject Foreign Tables from MIN/MAX indexscan Optimization? |
| Date: | 2026-01-06 04:04:26 |
| Message-ID: | 1497200.1767672266@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
"David E. Wheeler" <david(at)justatheory(dot)com> writes:
> On Jan 5, 2026, at 19:41, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, it looks a little
>> odd to ship the query in this form rather than as a MIN()/MAX(),
>> but I'm unconvinced that it's harmful per se.
> It would be less harmful if the LIMIT pushed down; otherwise it’s pulling the entire table back into Postgres, no?
Yeah, there might be some gold to be mined there. IIRC we already
have some ability for LIMIT to tell an immediately-below Sort that
only a limited number of rows will be retrieved. Could be useful
if ForeignScan could be in on that.
>> It would be good to look under the hood a little more and understand
>> why the MinMaxAgg path is winning, if it's not actually fast.
> Where does one start?
I made the attached test case (based on postgres_fdw.sql, but with
a few more rows in the table). The idea is to check the behavior
for both an indexed column (c1) and an unindexed one (c2). To
eliminate cacheing/startup effects, I ran each test case three times
but considered only the lowest execution time of the three (often
but not always the last run).
The first attached results are from current HEAD; the second are
after applying your patch. What I find interesting is that
with use_remote_estimate enabled, it always goes for shipping the
MIN as-is. Your patch changes the behavior with use_remote_estimate
disabled, and we can see that the reason is that that mode estimates
the query cost at 100.24 with the minmax optimization and 146.78 when
shipping the aggregate. I've not dug into where we are getting
those numbers without use_remote_estimate, but perhaps that could
use refinement. On the other hand, there is not that much difference
between the actual runtimes, so perhaps it doesn't matter for
postgres_fdw.
I can't speak to your results with ClickHouse, since I have no idea
how the cost estimates are derived for that.
regards, tom lane
| Attachment | Content-Type | Size |
|---|---|---|
| mmtest.sql | text/plain | 1.8 KB |
| mmtest-head.out | text/plain | 4.2 KB |
| mmtest-patched.out | text/plain | 3.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jacob Jackson | 2026-01-06 04:07:58 | Re: Checking join outer relation uniqueness to prevent unnecessary memoization |
| Previous Message | Michael Paquier | 2026-01-06 03:27:29 | Re: Typos in the code and README |