Re: Improving connection scalability: GetSnapshotData()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jonathan Katz <jkatz(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Improving connection scalability: GetSnapshotData()
Date: 2020-04-07 19:26:36
Message-ID: CA+TgmoZT0XMZHgpgyZUnVw7_dh-y_2b2dVhvJGaZ1deVvQxmEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

0008 -

Here again, I greatly dislike putting Copy in the name. It makes
little sense to pretend that either is the original and the other is
the copy. You just have the same data in two places. If one of them is
more authoritative, the place to explain that is in the comments, not
by elongating the structure member name and supposing anyone will be
able to make something of that.

+ *
+ * XXX: That's why this is using vacuumFlagsCopy.

I am not sure there's any problem with the code that needs fixing
here, so I might think about getting rid of this XXX. But this gets
back to my complaint about the locking regime being unclear. What I
think you need to do here is rephrase the previous paragraph so that
it explains the reason for using this copy a bit better. Like "We read
the copy of vacuumFlags from PGPROC rather than visiting the copy
attached to ProcGlobal because we can do that without taking a lock.
See fuller explanation in <place>." Or whatever.

0009, 0010 -

I think you've got this whole series of things divided up too finely.
Like, 0005 feels like the meat of it, and that has a bunch of things
in it that could plausible be separated out as separate commits. 0007
also seems to do more than one kind of thing (see my comment regarding
moving some of that into 0006). But whacking everything around like a
crazy man in 0005 and a little more in 0007 and then doing the
following cleanup in these little tiny steps seems pretty lame.
Separating 0009 from 0010 is maybe the clearest example of that, but
IMHO it's pretty unclear why both of these shouldn't be merged with
0008.

To be clear, I exaggerate for effect. 0005 is not whacking everything
around like a crazy man. But it is a non-minimal patch, whereas I
consider 0009 and 0010 to be sub-minimal.

My comments on the Copy naming apply here as well. I am also starting
to wonder why exactly we need two copies of all this stuff. Perhaps
I've just failed to absorb the idea for having read the patch too
briefly, but I think that we need to make sure that it's super-clear
why we're doing that. If we just needed it for one field because
$REASONS, that would be one thing, but if we need it for all of them
then there must be some underlying principle here that needs a good
explanation in an easy-to-find and centrally located place.

0011 -

+ * Number of top-level transactions that completed in some form since the
+ * start of the server. This currently is solely used to check whether
+ * GetSnapshotData() needs to recompute the contents of the snapshot, or
+ * not. There are likely other users of this. Always above 1.

Does it only count XID-bearing transactions? If so, best mention that.

+ * transactions completed since the last GetSnapshotData()..

Too many periods.

+ /* Same with CSN */
+ ShmemVariableCache->xactCompletionCount++;

If I didn't know that CSN stood for commit sequence number from
reading years of mailing list traffic, I'd be lost here. So I think
this comment shouldn't use that term.

+GetSnapshotDataFillTooOld(Snapshot snapshot)

Uh... no clue what's going on here. Granted the code had no comments
in the old place either, so I guess it's not worse, but even the name
of the new function is pretty incomprehensible.

+ * Helper function for GetSnapshotData() that check if the bulk of the

checks

+ * the fields that need to change and returns true. false is returned
+ * otherwise.

Otherwise, it returns false.

+ * It is safe to re-enter the snapshot's xmin. This can't cause xmin to go

I know what it means to re-enter a building, but I don't know what it
means to re-enter the snapshot's xmin.

This whole comment seems a bit murky.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-07 19:30:55 Re: Improving connection scalability: GetSnapshotData()
Previous Message Andres Freund 2020-04-07 19:24:53 Re: Improving connection scalability: GetSnapshotData()