From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andrey Borodin <amborodin86(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kirill Reshke <reshke(at)double(dot)cloud>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use fadvise in wal replay |
Date: | 2023-01-19 21:19:10 |
Message-ID: | 0e9c1643-c620-47e7-b30e-6840b0abbe2e@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I looked at this patch today. The change is fairly simple, so I decided
to do a benchmark. To prepare, I created a cluster with a 1GB database,
created a backup, and ran 1h UPDATE workload with WAL archiving. Then,
the actual benchmark does this:
1. restore the datadir backup
2. copy the WAL from archive
3. drop caches
4. start the cluster, measure time until end of recovery
I did this with master/patched, and with/without Linux readahead. And I
did this on two different machines - both have SSD storage, one (i5) has
a RAID of SATA SSD devices, the other one (xeon) has a single NVMe SSD.
The results (in seconds) look like this (the amount of WAL is different
on each machine, so the timings are not comparable).
host ra master patched
--------------------------------------
i5 0 615 513
256 392 396
--------------------------------------
xeon 0 2113 1436
256 1487 1460
On i5 (smaller machine with RAID of 6 x SATA SSD), with read-ahead
enabled it takes ~390 seconds, and the patch makes no difference.
Without read-ahead, it takes ~615 seconds, and the patch does help a
bit, but it's hardly competitive to the read-ahead.
Note: 256 is read-ahead per physical device, the read-ahead value for
the whole RAID is 6x that, i.e. 1536. I was speculating that maybe the
hard-coded 128kB RACHUNK is not sufficient, so I tried with 512kB. But
that actually made it worse, and the timing deteriorated to ~640s (that
is, slower than master without read-ahead).
On the xeon (with NVMe SSD), it's different - the patch seems about as
effective as regular read-ahead. So that's good.
So I'm a bit unsure about this patch. I doesn't seem like it can perform
better than read-ahead (although perhaps it does, on a different storage
system).
With disabled read-ahead it helps (at least a bit), although I'm not
really convinced there are good reasons to run without read-ahead. The
reason for doing that was described like this:
> Because database should know better than OS which data needs to be
> prefetched and which should not. Big OS readahead affects index scan
> performance.
I don't recall seeing such issue, and I can't find anything like that in
our mailinglist archives either. Sure, that doesn't mean it can't
happen, and read-ahead is a heuristics so it can do weird stuff. But in
my experience it tends to work fairly well. The issues I've seen are
generally in the opposite direction, i.e. read-ahead not kicking in.
Anyway, it's not my intent to prevent this patch getting committed, if
someone wishes to do that. But I'm not quite convinced it actually helps
with a practical issue.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2023-01-19 21:22:28 | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Previous Message | Israel Barth Rubio | 2023-01-19 21:12:44 | Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist |