Re: [PATCH] Incremental backup: add backup profile to base backup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-19 03:16:47
Message-ID: CAA4eK1KLa72xsCPdFm1EwwoDbXkejMaXdVaUyuqdDNJfjAzjJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 18, 2014 at 6:35 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
wrote:
>
> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>>
>> Marco Nenciarini wrote:
>>
>>> To calculate the md5 checksum I've used the md5 code present in pgcrypto
>>> contrib as the code in src/include/libpq/md5.h is not suitable for large
>>> files. Since a core feature cannot depend on a piece of contrib, I've
>>> moved the files
>>>
>>> contrib/pgcrypto/md5.c
>>> contrib/pgcrypto/md5.h
>>>
>>> to
>>>
>>> src/backend/utils/hash/md5.c
>>> src/include/utils/md5.h
>>>
>>> changing the pgcrypto extension to use them.
>>
>>
>> We already have the FNV checksum implementation in the backend -- can't
>> we use that one for this and avoid messing with MD5?
>>
>> (I don't think we're looking for a cryptographic hash here. Am I wrong?)
>
>
> Hmm. Any user that can update a table can craft such an update that its
checksum matches an older backup. That may seem like an onerous task; to
correctly calculate the checksum of a file in a previous, you need to know
the LSNs and the exact data, including deleted data, on every block in the
table, and then construct a suitable INSERT or UPDATE that modifies the
table such that you get a collision. But for some tables it could be
trivial; you might know that a table was bulk-loaded with a particular LSN
and there are no dead tuples. Or you can simply create your own table and
insert exactly the data you want. Messing with your own table might seem
harmless, but it'll e.g. let you construct a case where an index points to
a tuple that doesn't exist anymore, or there's a row that doesn't pass a
CHECK-constraint that was added later. Even if there's no direct security
issue with that, you don't want that kind of uncertainty from a backup
solution.
>
> But more to the point, I thought the consensus was to use the highest LSN
of all the blocks in the file, no?

If we want to use highest LSN, then may be it would be
helpful for author to reuse some part of code which I have
written for Compute Max LSN utility which didn't got committed
because of not enough use case for it.
The latest patch for the same can be found at:
http://www.postgresql.org/message-id/CA+TgmoY1Wr0KUDpgSkuSPTre0vkt1CoTc+w+t3ZOwxOejX0SpA@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-08-19 03:34:14 Re: Improvement of versioning on Windows, take two
Previous Message Noah Misch 2014-08-19 03:15:06 Re: strncpy is not a safe version of strcpy