Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
To: Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2020-03-25 18:10:59
Message-ID: f4100c14985b1b7983aefd00cb6b491c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-03-21 14:16, Kartyshov Ivan wrote:
> As it was discussed earlier, I added wait for statement into
> begin/start statement.
Thanks! To address the discussion: I like the idea of having WAIT as a
part of BEGIN statement rather than a separate command, as Thomas Munro
suggested. That way, the syntax itself enforces that WAIT FOR LSN will
actually take effect, even for single-snapshot transactions. It seems
more convenient for the user, who won't have to remember the details
about how WAIT interacts with isolation levels.

> BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event
Not sure about this, but could we add "WAIT FOR .." as another
transaction_mode rather than a separate thing? That way, user won't have
to worry about the order. As of now, one should remember to always put
WAIT FOR as the Last parameter in the BEGIN statement.

> and event is:
> LSN value [options]
> TIMESTAMP value
I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed out,
it seems a lot like pg_sleep_until(). Besides, it doesn't necessarily
need to be connected to transaction start, which makes it different from
WAIT FOR LSN - so I wouldn't mix them together.

I had another look at the code:

===
In WaitShmemSize() you might want to use functions that check for
overflow - add_size()/mul_size(). They're used in similar situations,
for example in BTreeShmemSize().

===
This is how WaitUtility() is called - note that time_val will always be
> 0:
+    if (time_val <= 0)
+        time_val = 1;
+...
+    res = WaitUtility(lsn, (int)(time_val * 1000), dest);

(time_val * 1000) is passed to WaitUtility() as the delay argument. And
inside WaitUtility() we have this:

+if (delay > 0)
+    latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+else
+    latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;

Since we always pass a delay value greater than 0, we'll never get to
the "else" clause here and we'll never be ready to wait for LSN forever.
Perhaps due to that, the current test outputs this after a simple WAIT
FOR LSN command:
psql:<stdin>:1: NOTICE:  LSN is not reached.

===
Speaking of tests,

When I tried to test BEGIN TRANSACTION WAIT FOR LSN, I got a segfault:
LOG: statement: BEGIN TRANSACTION WAIT FOR LSN '0/3002808'
LOG: server process (PID 10385) was terminated by signal 11:
Segmentation fault
DETAIL: Failed process was running: COMMIT

Could you add some more tests to the patch when this is fixed? With WAIT
as part of BEGIN statement + with things such as WAIT FOR ALL ... / WAIT
FOR ANY ... / WAIT FOR LSN ... UNTIL TIMESTAMP ...

===
In WaitSetLatch() we should probably check backend for NULL before
calling SetLatch(&backend->procLatch)

We might also need to check wait statement for NULL in these two cases:
+  case T_TransactionStmt:
+  {...
+      result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);

case TRANS_STMT_START:
{...
+ WaitStmt *waitstmt = (WaitStmt *) stmt->wait;
+ res = WaitMain(waitstmt, dest);

===
After we added the "wait" attribute to TransactionStmt struct, do we
perhaps need to add something to _copyTransactionStmt() /
_equalTransactionStmt()?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-03-25 18:46:31 Re: AllocSetEstimateChunkSpace()
Previous Message Fabien COELHO 2020-03-25 18:09:38 Allow continuations in "pg_hba.conf" files