From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: finding changed blocks using WAL scanning |
Date: | 2019-04-22 23:44:45 |
Message-ID: | 20190422234445.s7mxt6xwfmumzlge@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 23, 2019 at 01:21:27AM +0200, Tomas Vondra wrote:
> On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote:
> > On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > > Oh. Well, I already explained my algorithm for doing that upthread,
> > > > which I believe would be quite cheap.
> > > >
> > > > 1. When you generate the .modblock files, stick all the block
> > > > references into a buffer. qsort(). Dedup. Write out in sorted
> > > > order.
> > >
> > > Having all of the block references in a sorted order does seem like it
> > > would help, but would also make those potentially quite a bit larger
> > > than necessary (I had some thoughts about making them smaller elsewhere
> > > in this discussion). That might be worth it though. I suppose it might
> > > also be possible to line up the bitmaps suggested elsewhere to do
> > > essentially a BitmapOr of them to identify the blocks changed (while
> > > effectively de-duping at the same time).
> >
> > I don't see why this would make them bigger than necessary. If you
> > sort by relfilenode/fork/blocknumber and dedup, then references to
> > nearby blocks will be adjacent in the file. You can then decide what
> > format will represent that most efficiently on output. Whether or not
> > a bitmap is better idea than a list of block numbers or something else
> > depends on what percentage of blocks are modified and how clustered
> > they are.
> >
>
> Not sure I understand correctly - do you suggest to deduplicate and sort
> the data before writing them into the .modblock files? Because that the
> the sorting would make this information mostly useless for the recovery
> prefetching use case I mentioned elsewhere. For that to work we need
> information about both the LSN and block, in the LSN order.
>
> So if we want to allow that use case to leverage this infrastructure, we
> need to write the .modfiles kinda "raw" and do this processing in some
> later step.
>
> Now, maybe the incremental backup use case is so much more important the
> right thing to do is ignore this other use case, and I'm OK with that -
> as long as it's a conscious choice.
I think the concern is that the more graunular the modblock files are
(with less de-duping), the larger they will be.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Mikhail Bautin | 2019-04-22 23:50:25 | memory leak checking |
Previous Message | Tomas Vondra | 2019-04-22 23:21:27 | Re: finding changed blocks using WAL scanning |