Re: finding changed blocks using WAL scanning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-11 00:11:11
Message-ID: CA+TgmobvLUuu75QQQSsAe=+beB_GBQm1faY96iyqSBPeokp9EQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 10, 2019 at 5:49 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> There is one thing that does worry me about the file-per-LSN-range
> approach, and that is memory consumption when trying to consume the
> information. Suppose you have a really high velocity system. I don't
> know exactly what the busiest systems around are doing in terms of
> data churn these days, but let's say just for kicks that we are
> dirtying 100GB/hour. That means, roughly 12.5 million block
> references per hour. If each block reference takes 12 bytes, that's
> maybe 150MB/hour in block reference files. If you run a daily
> incremental backup, you've got to load all the block references for
> the last 24 hours and deduplicate them, which means you're going to
> need about 3.6GB of memory. If you run a weekly incremental backup,
> you're going to need about 25GB of memory. That is not ideal. One
> can keep the memory consumption to a more reasonable level by using
> temporary files. For instance, say you realize you're going to need
> 25GB of memory to store all the block references you have, but you
> only have 1GB of memory that you're allowed to use. Well, just
> hash-partition the data 32 ways by dboid/tsoid/relfilenode/segno,
> writing each batch to a separate temporary file, and then process each
> of those 32 files separately. That does add some additional I/O, but
> it's not crazily complicated and doesn't seem too terrible, at least
> to me. Still, it's something not to like.

Oh, I'm being dumb. We should just have the process that writes out
these files sort the records first. Then when we read them back in to
use them, we can just do a merge pass like MergeAppend would do. Then
you never need very much memory at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-04-11 00:27:20 Re: Reducing the runtime of the core regression tests
Previous Message Tatsuo Ishii 2019-04-11 00:09:15 Re: PostgreSQL pollutes the file system