Re: pg_replication_origin_drop API potential race condition

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_replication_origin_drop API potential race condition
Date: 2021-02-03 12:17:31
Message-ID: CAA4eK1+RafTDCh4nX7rFSWEN9C6KkKXv+GH+wjTqrj0XV-0i8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 27, 2021 at 4:58 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Hackers.
>
> As discovered elsewhere [ak0125] there is a potential race condition
> in the pg_replication_origin_drop API
>
> The current code of pg_replication_origin_drop looks like:
> ====
> roident = replorigin_by_name(name, false);
> Assert(OidIsValid(roident));
>
> replorigin_drop(roident, true);
> ====
>
> Users cannot deliberately drop a non-existent origin
> (replorigin_by_name passes missing_ok = false) but there is still a
> small window where concurrent processes may be able to call
> replorigin_drop for the same valid roident.
>
> Locking within replorigin_drop guards against concurrent drops so the
> 1st execution will succeed, but then the 2nd execution would give
> internal cache error: elog(ERROR, "cache lookup failed for replication
> origin with oid %u", roident);
>
> Some ideas to fix this include:
> 1. Do nothing except write a comment about this in the code. The
> internal ERROR is not ideal for a user API there is no great harm
> done.
> 2. Change the behavior of replorigin_drop to be like
> replorigin_drop_IF_EXISTS, so the 2nd execution of this race would
> silently do nothing when it finds the roident is already gone.
> 3. Same as 2, but make the NOP behavior more explicit by introducing a
> new "missing_ok" parameter for replorigin_drop.
>

How about if we call replorigin_by_name() inside replorigin_drop after
acquiring the lock? Wouldn't that close this race condition? We are
doing something similar for pg_replication_origin_advance().

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-03 12:38:41 Re: Single transaction in the tablesync worker?
Previous Message Peter Eisentraut 2021-02-03 12:10:51 pg_dump: Add const decorations