Re: finding changed blocks using WAL scanning

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-22 23:44:45
Message-ID: 20190422234445.s7mxt6xwfmumzlge@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote:
> On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote:
> > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > > Oh. Well, I already explained my algorithm for doing that upthread,
> > > > which I believe would be quite cheap.
> > > >
> > > > 1. When you generate the .modblock files, stick all the block
> > > > references into a buffer. qsort(). Dedup. Write out in sorted
> > > > order.
> > >
> > > Having all of the block references in a sorted order does seem like it
> > > would help, but would also make those potentially quite a bit larger
> > > than necessary (I had some thoughts about making them smaller elsewhere
> > > in this discussion). That might be worth it though. I suppose it might
> > > also be possible to line up the bitmaps suggested elsewhere to do
> > > essentially a BitmapOr of them to identify the blocks changed (while
> > > effectively de-duping at the same time).
> >
> > I don't see why this would make them bigger than necessary. If you
> > sort by relfilenode/fork/blocknumber and dedup, then references to
> > nearby blocks will be adjacent in the file. You can then decide what
> > format will represent that most efficiently on output. Whether or not
> > a bitmap is better idea than a list of block numbers or something else
> > depends on what percentage of blocks are modified and how clustered
> > they are.
> >
>
> Not sure I understand correctly - do you suggest to deduplicate and sort
> the data before writing them into the .modblock files? Because that the
> the sorting would make this information mostly useless for the recovery
> prefetching use case I mentioned elsewhere. For that to work we need
> information about both the LSN and block, in the LSN order.
>
> So if we want to allow that use case to leverage this infrastructure, we
> need to write the .modfiles kinda "raw" and do this processing in some
> later step.
>
> Now, maybe the incremental backup use case is so much more important the
> right thing to do is ignore this other use case, and I'm OK with that -
> as long as it's a conscious choice.

I think the concern is that the more graunular the modblock files are
(with less de-duping), the larger they will be.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikhail Bautin 2019-04-22 23:50:25 memory leak checking
Previous Message Tomas Vondra 2019-04-22 23:21:27 Re: finding changed blocks using WAL scanning