Re: pg_receivewal starting position

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: pg_receivewal starting position
Date: 2021-09-03 15:49:34
Message-ID: CALj2ACVCPPSL99oKG_F8k7g8NkWytw=N6HaYKz--p1U6bCxPyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> There is still the concern raised by Bharath about being able to select the
> way to fetch the replication slot information for the user, and what should or
> should not be included in the documentation. I am in favor of documenting the
> process of selecting the wal start, and maybe include a --start-lsn option for
> the user to override it, but that's maybe for another patch.

Let's hear from others.

Thanks for the patches. I have some quick comments on the v5 patch-set:

0001:
1) Do you also want to MemSet values too in ReadReplicationSlot?

2) When if clause has single statement we don't generally use "{" and "}"
+ if (slot == NULL || !slot->in_use)
+ {
+ LWLockRelease(ReplicationSlotControlLock);
+ }
you can just have:
+ if (slot == NULL || !slot->in_use)
+ LWLockRelease(ReplicationSlotControlLock);

3) This is still not copying the slot contents, as I said earlier you
can just take the required info into some local variables instead of
taking the slot pointer to slot_contents pointer.
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

4) As I said earlier, why can't we have separate tli variables for
more readability instead of just one slots_position_timeline and
timeline_history variable? And you are not even resetting those 2
variables after if
(!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end
up sending the restart_lsn timelineid for confirmed_flush. So, better
use two separate variables. In fact you can use block local variables:
+ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
+ {
+ List *tl_history= NIL;
+ TimeLineID tli;
+ tl_history= readTimeLineHistory(ThisTimeLineID);
+ tli = tliOfPointInHistory(slot_contents.data.restart_lsn,
+ tl_history);
+ values[i] = Int32GetDatum(tli);
+ nulls[i] = false;
+ }
similarly for confirmed_flush.

5) I still don't see the need for below test case:
+ "READ_REPLICATION_SLOT does not produce an error with non existent slot");
when we have
+ "READ_REPLICATION_SLOT does not produce an error with dropped slot");
Because for a user, dropped or non existent slot is same, it's just
that for dropped slot we internally don't delete its entry from the
shared memory.

0002:
1) Imagine GetSlotInformation always returns
READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an
infinite loop there? Instead, why don't you just exit(1); instead of
return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I
think, you can just do exit(1), no need to retry.
+ case READ_REPLICATION_SLOT_ERROR:
+
+ /*
+ * Error has been logged by GetSlotInformation, return and
+ * maybe retry
+ */
+ return;

2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't
passed? And why are you having this check after you connect to the
server and fetch the data?
+ /* If no slotinformation has been passed, we can return immediately */
+ if (slot_info == NULL)
+ {
+ PQclear(res);
+ return READ_REPLICATION_SLOT_OK;
+ }
Instead you can just have a single assert:

+ Assert(slot_name && slot_info );

3) How about
pg_log_error("could not read replication slot:
instead of
pg_log_error("could not fetch replication slot:

4) Why are you having the READ_REPLICATION_SLOT_OK case in default?
+ default:
+ if (slot_info.is_logical)
+ {
+ /*
+ * If the slot is not physical we can't expect to
+ * recover from that
+ */
+ pg_log_error("cannot use the slot provided, physical slot expected.");
+ exit(1);
+ }
+ stream.startpos = slot_info.restart_lsn;
+ stream.timeline = slot_info.restart_tli;
+ }
You can just have another case statement for READ_REPLICATION_SLOT_OK
and in the default you can throw an error "unknown read replication
slot status" or some other better working and exit(1);

5) Do you want initialize slot_info to 0?
+ if (replication_slot)
+ {
+ SlotInformation slot_info;
+ MemSet(slot_info, 0, sizeof(SlotInformation));

6) This isn't readable:
+ slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0;
How about:
if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0)
slot_info->is_logical = true;
You don't need to set it false, because you would have
MemSet(slot_info) in the caller.

7) How about
uint32 hi;
uint32 lo;
instead of
+ uint32 hi,
+ lo;

8) Move SlotInformation * slot_info) to the next line as it crosses
the 80 char limit.
+GetSlotInformation(PGconn *conn, const char *slot_name,
SlotInformation * slot_info)

9) Instead of a boolean is_logical, I would rather suggest to use an
enum or #define macros the slot types correctly, because someday we
might introduce new type slots and somebody wants is_physical kind of
variable name?
+typedef struct SlotInformation {
+ bool is_logical;

Regards,
Bharath Rupireddy.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-09-03 16:24:04 Re: A reloption for partitioned tables - parallel_workers
Previous Message Bharath Rupireddy 2021-09-03 15:03:45 Re: Trap errors from streaming child in pg_basebackup to exit early