Seq scans status update

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Seq scans status update
Date: 2007-05-17 16:32:05
Message-ID: 464C8385.7030409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Attached is a new version of Simon's "scan-resistant buffer manager"
patch. It's not ready for committing yet because of a small issue I
found this morning (* see bottom), but here's a status update.

To recap, the basic idea is to use a small ring of buffers for large
scans like VACUUM, COPY and seq-scans. Changes to the original patch:

- a different sized ring is used for VACUUMs and seq-scans, and COPY.
VACUUM and COPY use a ring of 32 buffers, and COPY uses a ring of 4096
buffers in default configuration. See README changes in the patch for
rationale.

- for queries with large seqscans, the buffer ring is only used for
reads issued by the seq scan, not for any other reads in the query.
Typical scenario where this matters is doing a large seq scan with a
nested loop join to a smaller table. You don't want to use the buffer
ring for index lookups inside the nested loop.

- for seqscans, drop buffers from the ring that would need a WAL flush
to reuse. That makes bulk updates to behave roughly like they do without
the patch, instead of having to do a WAL flush every 32 pages.

I've spent a lot of time thinking of solutions to the last point. The
obvious solution would be to not use the buffer ring for updating scans.
The difficulty with that is that we don't know if a scan is read-only in
heapam.c, where the hint to use the buffer ring is set.

I've completed a set of performance tests on a test server. The server
has 4 GB of RAM, of which 1 GB is used for shared_buffers.

Results for a 10 GB table:

head-copy-bigtable | 00:10:09.07016
head-copy-bigtable | 00:10:20.507357
head-copy-bigtable | 00:10:21.857677
head-copy_nowal-bigtable | 00:05:18.232956
head-copy_nowal-bigtable | 00:03:24.109047
head-copy_nowal-bigtable | 00:05:31.019643
head-select-bigtable | 00:03:47.102731
head-select-bigtable | 00:01:08.314719
head-select-bigtable | 00:01:08.238509
head-select-bigtable | 00:01:08.208563
head-select-bigtable | 00:01:08.28347
head-select-bigtable | 00:01:08.308671
head-vacuum_clean-bigtable | 00:01:04.227832
head-vacuum_clean-bigtable | 00:01:04.232258
head-vacuum_clean-bigtable | 00:01:04.294621
head-vacuum_clean-bigtable | 00:01:04.280677
head-vacuum_hintbits-bigtable | 00:04:01.123924
head-vacuum_hintbits-bigtable | 00:03:58.253175
head-vacuum_hintbits-bigtable | 00:04:26.318159
head-vacuum_hintbits-bigtable | 00:04:37.512965
patched-copy-bigtable | 00:09:52.776754
patched-copy-bigtable | 00:10:18.185826
patched-copy-bigtable | 00:10:16.975482
patched-copy_nowal-bigtable | 00:03:14.882366
patched-copy_nowal-bigtable | 00:04:01.04648
patched-copy_nowal-bigtable | 00:03:56.062272
patched-select-bigtable | 00:03:47.704154
patched-select-bigtable | 00:01:08.460326
patched-select-bigtable | 00:01:10.441544
patched-select-bigtable | 00:01:11.916221
patched-select-bigtable | 00:01:13.848038
patched-select-bigtable | 00:01:10.956133
patched-vacuum_clean-bigtable | 00:01:10.315439
patched-vacuum_clean-bigtable | 00:01:12.210537
patched-vacuum_clean-bigtable | 00:01:15.202114
patched-vacuum_clean-bigtable | 00:01:10.712235
patched-vacuum_hintbits-bigtable | 00:03:42.279201
patched-vacuum_hintbits-bigtable | 00:04:02.057778
patched-vacuum_hintbits-bigtable | 00:04:26.805822
patched-vacuum_hintbits-bigtable | 00:04:28.911184

In other words, the patch has no significant effect, as expected. The
select times did go up by a couple of seconds, which I didn't expect,
though. One theory is that unused shared_buffers are swapped out during
the tests, and bgwriter pulls them back in. I'll set swappiness to 0 and
try again at some point.

Results for a 2 GB table:

copy-medsize-unpatched | 00:02:18.23246
copy-medsize-unpatched | 00:02:22.347194
copy-medsize-unpatched | 00:02:23.875874
copy_nowal-medsize-unpatched | 00:01:27.606334
copy_nowal-medsize-unpatched | 00:01:17.491243
copy_nowal-medsize-unpatched | 00:01:31.902719
select-medsize-unpatched | 00:00:03.786031
select-medsize-unpatched | 00:00:02.678069
select-medsize-unpatched | 00:00:02.666103
select-medsize-unpatched | 00:00:02.673494
select-medsize-unpatched | 00:00:02.669645
select-medsize-unpatched | 00:00:02.666278
vacuum_clean-medsize-unpatched | 00:00:01.091356
vacuum_clean-medsize-unpatched | 00:00:01.923138
vacuum_clean-medsize-unpatched | 00:00:01.917213
vacuum_clean-medsize-unpatched | 00:00:01.917333
vacuum_hintbits-medsize-unpatched | 00:00:01.683718
vacuum_hintbits-medsize-unpatched | 00:00:01.864003
vacuum_hintbits-medsize-unpatched | 00:00:03.186596
vacuum_hintbits-medsize-unpatched | 00:00:02.16494
copy-medsize-patched | 00:02:35.113501
copy-medsize-patched | 00:02:25.269866
copy-medsize-patched | 00:02:31.881089
copy_nowal-medsize-patched | 00:01:00.254633
copy_nowal-medsize-patched | 00:01:04.630687
copy_nowal-medsize-patched | 00:01:03.729128
select-medsize-patched | 00:00:03.201837
select-medsize-patched | 00:00:01.332975
select-medsize-patched | 00:00:01.33014
select-medsize-patched | 00:00:01.332392
select-medsize-patched | 00:00:01.333498
select-medsize-patched | 00:00:01.332692
vacuum_clean-medsize-patched | 00:00:01.140189
vacuum_clean-medsize-patched | 00:00:01.062762
vacuum_clean-medsize-patched | 00:00:01.062402
vacuum_clean-medsize-patched | 00:00:01.07113
vacuum_hintbits-medsize-patched | 00:00:17.865446
vacuum_hintbits-medsize-patched | 00:00:15.162064
vacuum_hintbits-medsize-patched | 00:00:01.704651
vacuum_hintbits-medsize-patched | 00:00:02.671651

This looks good to me, except for some glitch at the last
vacuum_hintbits tests. Selects and vacuums benefit significantly, as
does non-WAL-logged copy.

Not shown here, but I run tests earlier with vacuum on a table that
actually had dead tuples to be removed on it. In that test the patched
version really shined, reducing the runtime to ~ 1/6th. That was the
original motivation of this patch: not having to do a WAL flush on every
page in the 2nd phase of vacuum.

Test script attached. To use it:

1. Edit testscript.sh. Change BIGTABLESIZE.
2. Start postmaster
3. Run script, giving test-label as argument. For example:
"./testscript.sh bigtable-patched"

Attached is also the patch I used for the tests.

I would appreciate it if people would download the patch and the script
and repeat the tests on different hardware. I'm particularly interested
in testing on a box with good I/O hardware where selects on unpatched
PostgreSQL are bottlenecked by CPU.

Barring any surprises I'm going to fix the remaining issue and submit a
final patch, probably in the weekend.

(*) The issue with this patch is that if the buffer cache is completely
filled with dirty buffers that need a WAL flush to evict, the buffer
ring code will get into an infinite loop trying to find one that doesn't
need a WAL flush. Should be simple to fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
testscript.sh application/x-shellscript 2.6 KB
sr-fixed-7.patch text/x-diff 29.1 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2007-05-17 16:56:55 Re: UTF8MatchText
Previous Message Alvaro Herrera 2007-05-17 16:17:56 Re: [DOCS] Autovacuum and XID wraparound