Re: finding changed blocks using WAL scanning

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-20 04:42:38
Message-ID: 20190420044238.GX6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > While I do think we should at least be thinking about the load caused
> > from scanning the WAL to generate a list of blocks that are changed, the
> > load I was more concerned with in the other thread is the effort
> > required to actually merge all of those changes together over a large
> > amount of WAL. I'm also not saying that we couldn't have either of
> > those pieces done as a background worker, just that it'd be really nice
> > to have an external tool (or library) that can be used on an independent
> > system to do that work.
>
> Oh. Well, I already explained my algorithm for doing that upthread,
> which I believe would be quite cheap.
>
> 1. When you generate the .modblock files, stick all the block
> references into a buffer. qsort(). Dedup. Write out in sorted
> order.

Having all of the block references in a sorted order does seem like it
would help, but would also make those potentially quite a bit larger
than necessary (I had some thoughts about making them smaller elsewhere
in this discussion). That might be worth it though. I suppose it might
also be possible to line up the bitmaps suggested elsewhere to do
essentially a BitmapOr of them to identify the blocks changed (while
effectively de-duping at the same time).

> 2. When you want to use a bunch of .modblock files, do the same thing
> MergeAppend does, or what merge-sort does when it does a merge pass.
> Read the first 1MB of each file (or whatever amount). Repeatedly pull
> an item from whichever file has the lowest remaining value, using a
> binary heap. When no buffered data remains for a particular file,
> read another chunk from that file.

Sure, this is essentially a MergeAppend/MergeSort+GroupAgg on top to get
the distinct set, if I'm following what you're suggesting here.

> If each .modblock file covers 1GB of WAL, you could the data from
> across 1TB of WAL using only 1GB of memory, and that's assuming you
> have a 1MB buffer for each .modblock file. You probably don't need
> such a large buffer. If you use, say, a 128kB buffer, you could merge
> the data from across 8TB of WAL using 1GB of memory. And if you have
> 8TB of WAL and you can't spare 1GB for the task of computing which
> blocks need to be included in your incremental backup, it's time for a
> hardware upgrade.

How much additional work is it going to be to sort/dedup for each 1GB of
WAL though, along with the resulting size? I'm specifically thinking
about some of the very high WAL-generation rate systems that I've seen,
with TBs of WAL per day. I get that once you've got nicely sorted
inputs that it isn't too bad to generate a distinct set from that, but
it seems like that moves the problem to the sorting side then rather
than eliminating it, and may make other things less efficient,
particularly when there's workloads that have strings of modified
buffers together and we could capture that more efficiently.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-04-20 11:22:02 refactoring - share str2*int64 functions
Previous Message Robert Haas 2019-04-20 04:21:36 Re: finding changed blocks using WAL scanning