Re: finding changed blocks using WAL scanning

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-15 20:31:14
Message-ID: 20190415203114.pb4e2vgbtbhopcdw@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 10, 2019 at 08:11:11PM -0400, Robert Haas wrote:
> On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > There is one thing that does worry me about the file-per-LSN-range
> > approach, and that is memory consumption when trying to consume the
> > information. Suppose you have a really high velocity system. I don't
> > know exactly what the busiest systems around are doing in terms of
> > data churn these days, but let's say just for kicks that we are
> > dirtying 100GB/hour. That means, roughly 12.5 million block
> > references per hour. If each block reference takes 12 bytes, that's
> > maybe 150MB/hour in block reference files. If you run a daily
> > incremental backup, you've got to load all the block references for
> > the last 24 hours and deduplicate them, which means you're going to
> > need about 3.6GB of memory. If you run a weekly incremental backup,
> > you're going to need about 25GB of memory. That is not ideal. One
> > can keep the memory consumption to a more reasonable level by using
> > temporary files. For instance, say you realize you're going to need
> > 25GB of memory to store all the block references you have, but you
> > only have 1GB of memory that you're allowed to use. Well, just
> > hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno,
> > writing each batch to a separate temporary file, and then process each
> > of those 32 files separately. That does add some additional I/O, but
> > it's not crazily complicated and doesn't seem too terrible, at least
> > to me. Still, it's something not to like.
>
> Oh, I'm being dumb. We should just have the process that writes out
> these files sort the records first. Then when we read them back in to
> use them, we can just do a merge pass like MergeAppend would do. Then
> you never need very much memory at all.

Can I throw out a simple idea? What if, when we finish writing a WAL
file, we create a new file 000000010000000000000001.modblock which
lists all the heap/index files and block numbers modified in that WAL
file? How much does that help with the list I posted earlier?

I think there is some interesting complexity brought up in this thread.
Which options are going to minimize storage I/O, network I/O, have only
background overhead, allow parallel operation, integrate with
pg_basebackup. Eventually we will need to evaluate the incremental
backup options against these criteria.

I am thinking tools could retain modblock files along with WAL, could
pull full-page-writes from WAL, or from PGDATA. It avoids the need to
scan 16MB WAL files, and the WAL files and modblock files could be
expired independently.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-15 20:47:46 Re: COLLATE: Hash partition vs UPDATE
Previous Message Tomas Vondra 2019-04-15 20:17:09 Re: Zedstore - compressed in-core columnar storage