Re: Backup command and functions can cause assertion failure and segmentation fault

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Backup command and functions can cause assertion failure and segmentation fault
Date: 2022-07-15 07:46:32
Message-ID: 02347f81-304f-3a14-1a33-5f40dfd7da48@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022/07/14 17:00, Michael Paquier wrote:
> On Fri, Jul 08, 2022 at 08:56:14AM -0400, Robert Haas wrote:
>> On Thu, Jul 7, 2022 at 10:58 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>> But if many think that it's worth adding the test, I will give a
>>> try. But even in that case, I think it's better to commit the
>>> proposed patch at first to fix the bug, and then to write the patch
>>> adding the test.
>
> I have looked at that in details,

Thanks!

and it is possible to rely on
> pg_stat_activity.wait_event to be BaseBackupThrottle, which would make

ISTM that you can also use pg_stat_progress_basebackup.phase.

> sure that the checkpoint triggered at the beginning of the backup
> finishes and that we are in the middle of the base backup. The
> command for the test should be a psql command with two -c switches
> without ON_ERROR_STOP, so as the second pg_backup_stop() starts after
> BASE_BACKUP is cancelled using the same connection, for something like
> that:
> psql -c "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);" \
> -c "select pg_backup_stop()" "replication=database"
>
> The last part of the test should do a pump_until() and capture "backup
> is not in progress" from the stderr output of the command run.
>
> This is leading me to the attached, that crashes quickly without the
> fix and passes with the fix.

Thanks for the patch! But I'm still not sure if it's worth adding only this test for the corner case while we don't have basic tests for BASE_BACKUP, pg_backup_start and pg_backup_stop.

BTW, if we decide to add that test, are you planning to back-patch it?

>
>> It's true that we don't really have good test coverage of write-ahead
>> logging and recovery, but this doesn't seem like the most important
>> thing to be testing in that area, either, and developing stable tests
>> for stuff like this can be a lot of work.
>
> Well, stability does not seem like a problem to me here.
>
>> I do kind of feel like the patch is fixing two separate bugs. The
>> change to SendBaseBackup() is fixing the problem that, because there's
>> SQL access on replication connections, we could try to start a backup
>> in the middle of another backup by mixing and matching the two
>> different methods of doing backups. The change to do_pg_abort_backup()
>> is fixing the fact that, after aborting a base backup, we don't reset
>> the session state properly so that another backup can be tried
>> afterwards.
>>
>> I don't know if it's worth committing them separately - they are very
>> small fixes. But it would probably at least be good to highlight in
>> the commit message that there are two different issues.
>
> Grouping both fixes in the same commit sounds fine by me. No
> objections from here.

This sounds fine to me, too. On the other hand, it's also fine for me to push the changes separately so that we can easily identify each change later. So I separated the patch into two ones.

Since one of them failed to be applied to v14 or before cleanly, I also created the patch for those back branches. So I attached three patches.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
0001-Prevent-BASE_BACKUP-in-the-middle-of-another-backup-_master_v15.patch text/plain 1.9 KB
0001-Prevent-BASE_BACKUP-in-the-middle-of-another-backup-_v14_v10.patch text/plain 1.9 KB
0002-Fix-assertion-failure-and-segmentation-fault-in-back.patch text/plain 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-07-15 07:47:58 Re: Allowing REINDEX to have an optional name
Previous Message Kyotaro Horiguchi 2022-07-15 07:30:59 Re: standby recovery fails (tablespace related) (tentative patch and discussion)