optimizing vacuum truncation scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: optimizing vacuum truncation scans
Date: 2015-04-20 02:09:11
Message-ID: CAMkU=1wK_At0JfE4Esm8Mgrfrr8mLTxpEYnvpVggPawKv6VC0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After a large bulk load aborted near the end, I decided to vacuum the main
table so as to not leave a huge chunk of free space in the middle of it,
before re-running the bulk load. This vacuum took a frustratingly long
time, as the backwards scan over the table to truncate the space did not
trigger read-ahead (or read-behind) in the file system. (Which was ext4).

In the hind-sight, the best thing to do would have been to open a psql
session and take an access-share lock on the main table, so that it would
prevent the truncation from occurring. Leaving cleaned, reusable space at
the end of the table to be refilled upon the next (successful) run of the
bulk load. But it would be nice if the system could do some optimization
for me, as hind sight never comes in time.

I did literally the simplest thing I could think of as a proof of concept
patch, to see if it would actually fix things. I just jumped back a
certain number of blocks occasionally and prefetched them forward, then
resumed the regular backward scan. The patch and driving script are
attached.

It was very successful in this test. I've report the fastest time to do
the truncation for each setting of the prefetch size out of about 10 runs
each, after removing a handful of cases where buffer pins prevented it from
doing a full-size truncation. (I used the minimum, because the server has
other duties what would occasionally kick in and cause a spike in time for
a few runs). 0 setting disables this feature.

JJFETCH trunc_time(s) 0 355.78 1 350.5 2 277.61 4 213.9 8 172.79 16
138.87 32 133.02 64 76.25 128 47.63 256 35.28 512 20.44 1024 18.6
2048 15.63 4096 15.34 8192 14.72 16384 13.45 32768 13.13

For context, the forward-scan part of this vacuum took about 80 sec (IQR 75
to 88).

Do we want something like this at all? If so, what would we have to do to
implement it for real? I thought that maybe the back-off amount could
start at one block and double each time, until it hits a maximum. And that
the maximum would be either seg_size, or 1/8 of effective_cache_size,
whichever is lower. (I don't think we would want to prefetch more than a
fraction of effective_cache_size, or else it could have already been
displaced by the time the scan gets back to it).

Also, what would be the best way to drill a hole through bufmgr.c into md.c
so that the prefetch could specify an entire range, rather than looping
over each individual block?

What would have to be done to detect people running on SSD and disable the
feature, if anything?

I'll add this to next commitfest as WIP patch.

Cheers,

Jeff

Attachment Content-Type Size
jjprefetch.patch application/octet-stream 1.8 KB
jjprefetch.sh application/x-sh 374 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-04-20 02:21:57 Re: Improve sleep processing of pg_rewind TAP tests
Previous Message Stephen Frost 2015-04-20 00:48:26 Re: alternative compression algorithms?