Re: A proposal for shared memory based backup infrastructure

From: mahendrakar s <mahendrakarforpg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, arorasam(at)gmail(dot)com, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: A proposal for shared memory based backup infrastructure
Date: 2022-07-25 04:33:03
Message-ID: CABkiuWrPh0v39hqJHKi0MKC5Y7TY91+SYM-wfgPdKqeK8frcLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Bharath,

*"Typically, step (3) takes a good amount of time in productionenvironments
with terabytes or petabytes scale of data and keeping thesession alive from
step (1) to (4) has overhead and it wastes theresources. And the session
can get closed for various reasons - idlein session timeout, tcp/ip
keepalive timeout, network problems etc.All of these can render the backup
useless."*

>> this could be a common scenario and needs to be addressed.

*"What if the backup started by a session can also be closed by
anothersession? This seems to be achievable, if we can place
thebackup_label, tablespace_map and other required session/backend
levelcontents in shared memory with the key as backup_label name. It's
along way to go."*

*>> * I think storing metadata about backup of a session in shared memory
may not work as it gets purged when the database goes for restart. We might
require a separate catalogue table to handle the backup session.

Thanks,
Mahendrakar.

On Sat, 23 Jul 2022 at 14:59, Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> Hi,
>
> Right now, the session that starts the backup with pg_backup_start()
> has to end it with pg_backup_stop() which returns the backup_label and
> tablespace_map contents (commit 39969e2a1). If the backups were to be
> taken using custom disk snapshot tools on production servers,
> following are the high-level steps involved:
> 1) open a session
> 2) run pg_backup_start() using the same session opened in (1)
> 3) run custom disk snapshot tools which may, many-a-times, will copy
> the entire data directory over the network
> 4) run pg_backup_stop() using the same session opened in (1)
>
> Typically, step (3) takes a good amount of time in production
> environments with terabytes or petabytes scale of data and keeping the
> session alive from step (1) to (4) has overhead and it wastes the
> resources. And the session can get closed for various reasons - idle
> in session timeout, tcp/ip keepalive timeout, network problems etc.
> All of these can render the backup useless.
>
> What if the backup started by a session can also be closed by another
> session? This seems to be achievable, if we can place the
> backup_label, tablespace_map and other required session/backend level
> contents in shared memory with the key as backup_label name. It's a
> long way to go. The idea may be naive at this stage and there might be
> something important that doesn't let us do the proposed solution. I
> would like to hear more thoughts from the hackers.
>
> Thanks to Sameer, Satya (cc-ed) for the offlist discussion.
>
> Regards,
> Bharath Rupireddy.
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-25 04:39:31 Re: predefined role(s) for VACUUM and ANALYZE
Previous Message Nathan Bossart 2022-07-25 04:30:44 Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS