| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2026-02-16 23:33:21 |
| Message-ID: | 833fb173-e59b-47d2-929d-5712987d3781@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2/17/26 00:05, Alexandre Felipe wrote:
> Hi guys,
>
> There seems to be some very interesting stuff here, I have to try to
> catch up with your analysis Andres.
>
> In the meantime.
>
> I am sharing the results I have got on a well behaved Linux system.
>
Can you share how is the system / Postgres configured? It's a good
practice to provide all the information others might need to reproduce
your results.
In particular, what is shared_buffers set to? Are you still using
io_method=worker? With how many io workers?
> No sophisticated algorithm here but evicting OS cache helps to verify
> the benefit of prefetching at a much smaller scale, and I think this is
> useful
> % gcc drop_cache.c -o drop_cache;
> % sudo chown root:root drop_cache;
> % sudo chmod 4755 drop_cache;
>
> I was executing like this
> python3 .../run_regression_test.py --port 5433 --iterations 10 \
> --columns sequential,random --workers 0 --evict os,off \
> --payload-size 50 \
> --rows 10000 \
> --reset \
> --ntables 5
>
> 1 table: significant benefit with HDD cold, SSD random cold access.
> 5 tables: significant benefit for random cold access. Somewhat
> detrimental for sequential cold access, and random hot access.
> 10 tables: significant benefit for random cold access. Slightly better
> than 5 tables for cold sequential access, and somewhat detrimental for
> random hot access.
>
> These results are hard to explain, but maybe Andres has the answer:
>> I think this specific issue is a bit different, because today you get
>> drastically different behaviour if you have
>>
>> a) [miss, (miss, hit)+]
>> vs
>> b) [(miss, hit)+]
>
What's the distance in those cases? You may need to add some logging to
read_stream to show that. If the distance is not ~1.0 then it's not the
issue described by Andres, I think.
There are other ways to look at issued IOs, either using iostat, or
tools like perf-trace.
>
> Tomas said
>> I think a "proper" solution would require some sort of cost model for
>> the I/O part, so that we can schedule the I/Os just so that the I/O
>> completes right before we actually need the page.
>
> I dare to ask
> Why not use this on a feedback loop?
>
> while (!current_buffer.ready && reasonable to prefetch) {
> fetch next index tuple.
> if necessary prefetch one more buffer
> }
>
What does "reasonable to prefetch" mean in practice, and how you
determine it at runtime, before initiating the buffer prefetch?
> I also dare to ask
> Is it possible to skip an unavailable buffer and gain time processing
> the rows that will be needed afterwards?
> This could also help by releasing buffers more quickly if they need to
> be recycled.
>
Not at the moment, AFAIK. And for most index-only scans that would not
really work anyway, because those need to produce sorted output.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-02-16 23:39:11 | Re: generating function default settings from pg_proc.dat |
| Previous Message | Henson Choi | 2026-02-16 23:30:26 | Re: Row pattern recognition |