Re: Causal reads take II

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Causal reads take II
Date: 2017-07-29 19:07:05
Message-ID: CA+q6zcVayN2j46-_Yb+V7+KGVhEBMYTUbkU-non9aYyHZ1q87g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 12 July 2017 at 23:45, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:
> I renamed it to "synchronous replay", because "causal reads" seemed a bit
too
> arcane.

Hi

I looked through the code of `synchronous-replay-v1.patch` a bit and ran a
few
tests. I didn't manage to break anything, except one mysterious error that
I've
got only once on one of my replicas, but I couldn't reproduce it yet.
Interesting thing is that this error did not affect another replica or
primary.
Just in case here is the log for this error (maybe you can see something
obvious, that I've not noticed):

```
LOG: could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT: WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG: could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT: WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG: could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT: WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG: could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT: WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG: directories for tablespace 47733 could not be removed
HINT: You can remove the directories manually if necessary.
CONTEXT: WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
FATAL: could not create directory "pg_tblspc/47734/PG_10_201707211/47732":
File exists
CONTEXT: WAL redo at 0/125F5768 for Storage/CREATE:
pg_tblspc/47734/PG_10_201707211/47732/47736
LOG: startup process (PID 8034) exited with exit code 1
LOG: terminating any other active server processes
LOG: database system is shut down
```

And speaking about the code, so far I have just a few notes (some of them
merely questions):

* In general the idea behind this patch sounds interesting for me, but it
relies heavily on time synchronization. As mentioned in the documentation:
"Current hardware clocks, NTP implementations and public time servers are
unlikely to allow the system clocks to differ more than tens or hundreds
of
milliseconds, and systems synchronized with dedicated local time servers
may
be considerably more accurate." But as far as I remember from my own
experience sometimes it maybe not that trivial on something like AWS
because
of virtualization. Maybe it's an unreasonable fear, but is it possible to
address this problem somehow?

* Also I noticed that some time-related values are hardcoded (e.g. 50%/25%
time shift when we're dealing with leases). Does it make sense to move
them
out and make them configurable?

* Judging from the `SyncReplayPotentialStandby` function, it's possible to
have
`synchronous_replay_standby_names = "*, server_name"`, which is basically
an
equivalent for just `*`, but it looks confusing. Is it worth it to prevent
this behaviour?

* In the same function `SyncReplayPotentialStandby` there is this code:

```
if (!SplitIdentifierString(rawstring, ',', &elemlist))
{
/* syntax error in list */
pfree(rawstring);
list_free(elemlist);
/* GUC machinery will have already complained - no need to do again */
return false;
}
```

Am I right that ideally this (a situation when at this point in the code
`synchronous_replay_standby_names` has incorrect value) should not happen,
because GUC will prevent us from that? If yes, then it looks for me that
it
still makes sense to put here a log message, just to give more
information in
a potentially weird situation.

* In the function `SyncRepReleaseWaiters` there is a commentary:

```
/*
* If the number of sync standbys is less than requested or we aren't
* managing a sync standby or a standby in synchronous replay state that
* blocks then just leave.
* /
if ((!got_recptr || !am_sync) && !walsender_sr_blocker)
```

Is this commentary correct? If I understand everything right
`!got_recptr` -
the number of sync standbys is less than requested (a), `!am_sync` - we
aren't
managing a sync standby (b), `walsender_sr_blocker` - a standby in
synchronous
replay state that blocks (c). Looks like condition is `(a or b) and not
c`.

* In the function `ProcessStandbyReplyMessage` there is a code that
implements
this:

```
* If the standby reports that it has fully replayed the WAL for at
least
* 10 seconds, then let's clear the lag times that were measured when it
* last wrote/flushed/applied a WAL record. This way we avoid displaying
* stale lag data until more WAL traffic arrives.
```
but I never found any mention of this 10 seconds in the documentation. Is
it
not that important? Also, question 2 is related to this one.

* In the function `WalSndGetSyncReplayStateString` all the states are in
lower
case except `UNKNOWN`, is there any particular reason for that?

There are also few more not that important notes (mostly about some typos
and
few confusing names), but I'm going to do another round of review and
testing
anyway so I'll just send them all next time.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Rofail 2017-07-29 19:26:51 Re: GSoC 2017: Foreign Key Arrays
Previous Message Andres Freund 2017-07-29 18:28:07 Re: segfault in HEAD when too many nested functions call