Re: block-level incremental backup

From: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-08-16 14:36:50
Message-ID: CALtqXTfFrm8gmd4rcLMmS0u4cvJHUqY982_kibSWR9q0BWwoeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 16, 2019 at 4:12 PM Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com> wrote:

>
>
>
>
> On Fri, Aug 16, 2019 at 3:24 PM Jeevan Chalke <
> jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>
>>
>>
>> On Fri, Aug 2, 2019 at 6:43 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>>
>>> Some comments:
>>> 1) There will be some link files created for tablespace, we might
>>> require some special handling for it
>>>
>>
>> Yep. I have that in my ToDo.
>> Will start working on that soon.
>>
>>
>>> 2)
>>> Retry functionality is hanlded only for copying of full files, should
>>> we handle retry for copying of partial files
>>> 3)
>>> we can have some range for maxretries similar to sleeptime
>>>
>>
>> I took help from pg_standby code related to maxentries and sleeptime.
>>
>> However, as we don't want to use system() call now, I have
>> removed all this kludge and just used fread/fwrite as discussed.
>>
>>
>>> 4)
>>> Should we check for malloc failure
>>>
>>
>> Used pg_malloc() instead. Same is also suggested by Ibrar.
>>
>>
>>>
>>> 5) Should we add display of progress as backup may take some time,
>>> this can be added as enhancement. We can get other's opinion on this.
>>>
>>
>> Can be done afterward once we have the functionality in place.
>>
>>
>>>
>>> 6)
>>> If the backup count increases providing the input may be difficult,
>>> Shall user provide all the incremental backups from a parent folder
>>> and can we handle the ordering of incremental backup internally
>>>
>>
>> I am not sure of this yet. We need to provide the tablespace mapping too.
>> But thanks for putting a point here. Will keep that in mind when I
>> revisit this.
>>
>>
>>>
>>> 7)
>>> Add verbose for copying whole file
>>>
>> Done
>>
>>
>>>
>>> 8) We can also check if approximate space is available in disk before
>>> starting combine backup, this can be added as enhancement. We can get
>>> other's opinion on this.
>>>
>>
>> Hmm... will leave it for now. User will get an error anyway.
>>
>>
>>>
>>> 9)
>>> Combine backup into directory can be combine backup directory
>>>
>> Done
>>
>>
>>>
>>> 10)
>>> MAX_INCR_BK_COUNT can be increased little, some applications use 1
>>> full backup at the beginning of the month and use 30 incremental
>>> backups rest of the days in the month
>>>
>>
>> Yeah, agree. But using any number here is debatable.
>> Let's see others opinion too.
>>
> Why not use a list?
>
>
>>
>>
>>> Regards,
>>> Vignesh
>>> EnterpriseDB: http://www.enterprisedb.com
>>>
>>
>>
>> Attached new sets of patches with refactoring done separately.
>> Incremental backup patch became small now and hopefully more
>> readable than the first version.
>>
>> --
>> Jeevan Chalke
>> Technical Architect, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
>
> + buf = (char *) malloc(statbuf->st_size);
>
> + if (buf == NULL)
>
> + ereport(ERROR,
>
> + (errcode(ERRCODE_OUT_OF_MEMORY),
>
> + errmsg("out of memory")));
>
> Why are you using malloc, you can use palloc here.
>
>
>
> Hi, I gave another look at the patch and have some quick comments.

-
> char *extptr = strstr(fn, ".partial");

I think there should be a better and strict way to check the file
extension.

-
> + extptr = strstr(outfn, ".partial");
> + Assert (extptr != NULL);

Why are you checking that again, you just appended that in the above
statement?

-
> + if (verbose && statbuf.st_size > (RELSEG_SIZE * BLCKSZ))
> + pg_log_info("found big file \"%s\" (size: %.2lfGB): %m",
fromfn,
> + (double) statbuf.st_size /
(RELSEG_SIZE * BLCKSZ));

This is not just a log, you find a file which is bigger which surely has
some problem.

-
> + * We do read entire 1GB file in memory while taking incremental
backup; so
> + * I don't see any reason why can't we do that here. Also,
copying data in
> + * chunks is expensive. However, for bigger files, we still
slice at 1GB
> + * border.

What do you mean by bigger file, a file greater than 1GB? In which case you
get file > 1GB?

--
Ibrar Ahmed

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-16 15:30:50 Re: UNION ALL
Previous Message Tom Lane 2019-08-16 14:18:58 Re: Unexpected "shared memory block is still in use"