Re: File based Incremental backup v8

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: File based Incremental backup v8
Date: 2015-03-07 15:20:17
Message-ID: 54FB1731.9030305@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Il 05/03/15 05:42, Bruce Momjian ha scritto:
> On Thu, Mar 5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
>>>> Yeah, it might make the situation better than today. But I'm afraid that
>>>> many users might get disappointed about that behavior of an incremental
>>>> backup after the release...
>>>
>>> I don't get what do you mean here. Can you elaborate this point?
>>
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
>
> We need numbers on how effective each level of tracking will be. Until
> then, the patch can't move forward.
>

I've written a little test script to estimate how much space can be saved by file level incremental, and I've run it on some real backups I have access to.

The script takes two basebackup directory and simulate how much data can be saved in the 2nd backup using incremental backup (using file size/time and LSN)

It assumes that every file in base, global and pg_tblspc which matches both size and modification time will also match from the LSN point of view.

The result is that many databases can take advantage of incremental, even if not do big, and considering LSNs yield a result almost identical to the approach based on filesystem metadata.

== Very big geographic database (similar to openstreetmap main DB), it contains versioned data, interval two months

First backup size: 13286623850656 (12.1TiB)
Second backup size: 13323511925626 (12.1TiB)
Matching files count: 17094
Matching LSN count: 14580
Matching files size: 9129755116499 (8.3TiB, 68.5%)
Matching LSN size: 9128568799332 (8.3TiB, 68.5%)

== Big on-line store database, old data regularly moved to historic partitions, interval one day

First backup size: 1355285058842 (1.2TiB)
Second backup size: 1358389467239 (1.2TiB)
Matching files count: 3937
Matching LSN count: 2821
Matching files size: 762292960220 (709.9GiB, 56.1%)
Matching LSN size: 762122543668 (709.8GiB, 56.1%)

== Ticketing system database, interval one day

First backup size: 144988275 (138.3MiB)
Second backup size: 146135155 (139.4MiB)
Matching files count: 3124
Matching LSN count: 2641
Matching files size: 76908986 (73.3MiB, 52.6%)
Matching LSN size: 67747928 (64.6MiB, 46.4%)

== Online store, interval one day

First backup size: 20418561133 (19.0GiB)
Second backup size: 20475290733 (19.1GiB)
Matching files count: 5744
Matching LSN count: 4302
Matching files size: 4432709876 (4.1GiB, 21.6%)
Matching LSN size: 4388993884 (4.1GiB, 21.4%)

== Heavily updated database, interval one week

First backup size: 3203198962 (3.0GiB)
Second backup size: 3222409202 (3.0GiB)
Matching files count: 1801
Matching LSN count: 1273
Matching files size: 91206317 (87.0MiB, 2.8%)
Matching LSN size: 69083532 (65.9MiB, 2.1%)

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Attachment Content-Type Size
estimate_incremental.py text/x-python-script 2.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-03-07 17:07:27 Re: TABLESAMPLE patch
Previous Message Stephen Frost 2015-03-07 14:24:41 Re: CATUPDATE confusion?