From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Make TID Scans recalculate the TIDs less often |
Date: | 2025-09-17 04:59:51 |
Message-ID: | CAApHDvoLMuLXakcAAsfjW=aKd_iFpqzf7k4G4wnQnym9RdSMsA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Over on [1], there was some concern about having to recalculate the
TID Scan's TidList on every recheck call. This work entails
evaluating all the TID expressions, sorting the resulting list of TIDs
and deduplicating it (so that we don't scan the same TID twice).
As it turns out, it certainly is possible that doing that work could
take up quite a bit of time, and having to do it as often as every
rescan *could* be noticed *if* the list is big enough and the rescans
are frequent enough. The following case demonstrates this:
set max_parallel_Workers_per_gather=0;
set enable_seqscan=0;
set enable_material=0;
set jit=0;
select sum(c) from million m left join lateral (select count(*) c from
empty where ctid in
('(1,1)','(1,2)','(1,3)','(1,4)','(1,5)','(1,6)','(1,7)','(1,8)','(1,9)','(1,10)','(1,11)','(1,12)','(1,13)','(1,14)','(1,15)','(1,16)','(1,17)','(1,18)','(1,19)','(1,20)','(1,21)','(1,22)','(1,23)','(1,24)','(1,25)','(1,26)','(1,27)','(1,28)','(1,29)','(1,30)','(1,31)','(1,32)','(1,33)','(1,34)','(1,35)','(1,36)','(1,37)','(1,38)','(1,39)','(1,40)','(1,41)','(1,42)','(1,43)','(1,44)','(1,45)','(1,46)','(1,47)','(1,48)','(1,49)','(1,50)','(1,51)','(1,52)','(1,53)','(1,54)','(1,55)','(1,56)','(1,57)','(1,58)','(1,59)','(1,60)','(1,61)','(1,62)','(1,63)','(1,64)','(1,65)','(1,66)','(1,67)','(1,68)','(1,69)','(1,70)','(1,71)','(1,72)','(1,73)','(1,74)','(1,75)','(1,76)','(1,77)','(1,78)','(1,79)','(1,80)','(1,81)','(1,82)','(1,83)','(1,84)','(1,85)','(1,86)','(1,87)','(1,88)','(1,89)','(1,90)','(1,91)','(1,92)','(1,93)','(1,94)','(1,95)','(1,96)','(1,97)','(1,98)','(1,99)','(1,100)'))
on 1=1;
master:
Time: 613.541 ms
Time: 621.037 ms
Time: 623.430 ms
patched:
Time: 298.863 ms
Time: 298.015 ms
Time: 297.172 ms
The part I don't know is if it's at all likely that someone would ever
hit this. We've added TID scan, so filtering on ctid must be common
enough to warrant having that code (yes, I know it's required for
WHERE CURRENT OF too), I just don't know how common rescans are in
those queries.
The patch optimises the recalc by changing things so the recalc is
only done when a parameter has changed that's mentioned somewhere in
TID quals. If no such parameter has changed, we use the same list as
we did on the last scan.
Does anyone think this is worth pursuing further?
Patch attached.
David
[1] https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Reduce-rescan-overheads-in-TID-Range-Scan.patch | application/octet-stream | 10.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2025-09-17 05:18:37 | RE: Parallel Apply |
Previous Message | Tom Lane | 2025-09-17 04:45:49 | Re: Fixing MSVC's inability to detect elog(ERROR) does not return |