Re: Rework the way multixact truncations work

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rework the way multixact truncations work
Date: 2015-07-02 18:28:25
Message-ID: 20150702182825.GF16267@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-02 13:58:45 -0400, Robert Haas wrote:
> On Thu, Jul 2, 2015 at 11:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Will look at 0003 next.
>
> + appendStringInfo(buf, "offsets [%u, %u), members [%u, %u)",
>
> I don't think we typically use this style for notating intervals.

I don't think we really have a very consistent style for xlog messages -
this seems to describe the meaning accurately?

> [several good points]

> + (errmsg("performing legacy multixact truncation,
> upgrade master")));
>
> This message needs work. I'm not sure exactly what it should say, but
> I'm pretty sure that's not clear enough.
>
> I seriously, seriously doubt that it is a good idea to perform the
> legacy truncation from MultiXactAdvanceOldest() rather than
> TruncateMultiXact().

But where should TruncateMultiXact() be called from? I mean, we could
move the logic from inside MultiXactAdvanceOldest() to some special case
in the replay routine, but what'd be the advantage?

> The checkpoint hasn't really happened at that point yet; you might
> truncate away stuff, then crash before the checkpoint is complete, and
> then we you restart recovery, you've got trouble.

We're only talking about restartpoints here, right? And I don't see the
problem - we don't read the slru anymore until the end of recovery, and
the end of recovery can't happen before reaching the minimum revovery
location?

> I think you should
> be very, very cautious about rejiggering the order of operations here.
> The current situation is not good, but casually rejiggering it can
> make things much worse.

The current placement - as part of the restartpoint - is utterly broken
and unpredictable. There'll frequently be no restartpoints performed at
all (due to different checkpoint segments, slow writeout, or pending
actions). Because there's no careful timing of when this happens it's
much harder to understand the exact state in which the truncation
happens - I think moving it to a specific location during replay makes
things considerably easier.

> Generally, I think you've weakened the logic in SetOffsetVacuumLimit()
> appreciably here. The existing code is careful never to set
> oldestOffsetKnown false when it was previously true. Your rewrite
> removes that property. Generally, I see no need for this function to
> be overhauled to the extent that you have, and would suggest reverting
> the changes that aren't absolutely required.

A lot of that has to do that the whole stuff about truncations happening
during checkpoints is gone and that thus the split with
DetermineSafeOldestOffset() doesn't make sense anymore.

That oldestOffsetKnown is unset is wrong - the if (prevOldestOffsetKnown
&& !oldestOffsetKnown) block should be a bit earlier.

> I don't particularly like the fact that find_multixact_start() calls
> SimpleLruFlush(). I think that's really a POLA violation: you don't
> expect that a function that looks like a simple inquiry is going to go
> do a bunch of unrelated I/O in the background. If somebody called
> find_multixact_start() with any frequency, you'd be sad. You're just
> doing it this way because of the context *in which you expect
> find_multixact_start* to be called, which does not seem very
> future-proof. I prefer Thomas's approach.

I don't strongly care, but I do think it has some value to be sure about
the on-disk state for the current callers. I think it'd be a pretty odd
thing to call find_multixact_start() frequently.

> If TruncateMultiXact() fails to acquire MultiXactTruncationLock right
> away, does it need to wait, or could it ConditionalAcquire and bail
> out if the lock isn't obtained?

That seems like premature optimization to me. And one that's not that
easy to do correctly - what if the current caller actually has a new,
lower, minimum mxid?

> + * XXX: It's also possible that the page that oldestMXact is on has
> + * already been truncated away, and we crashed before updating
> + * oldestMXact.
>
> But why is that an XXX? I think this is just a case of recovery
> needing tolerate redo of an action already redone.

Should rather have been NB.

> I'm not convinced that it's a good idea to remove
> lastCheckpointedOldest and replace it with nothing. It seems like a
> very good idea to have two separate pieces of state in shared memory:

> - The oldest offset that we think anyone might need to access to make
> a visibility check for a tuple.
> - The oldest offset that we still have on disk.
>
> The latter would now need to be called something else, not
> lastCheckpointedOldest, but I think it's still good to have it.

> Otherwise, I don't see how you protect against the on-disk state
> wrapping around before you finish truncating, and then maybe
> truncation eats something that was busy getting reused.

Unless I miss something the stop limits will prevent that from
happening? SetMultiXactIdLimit() is called only *after* the truncation
has finished?

> + * Update in-memory limits before performing the truncation, while inside
> + * the critical section: Have to do it before truncation, to prevent
> + * concurrent lookups of those values. Has to be inside the critical
> + * section asotherwise a future call to this function would error out,
> + * while looking up the oldest member in offsets, if our caller crashes
> + * before updating the limits.
>
> Missing space: asotherwise.
>
> Who else might be concurrently looking up those values? Nobody else
> can truncate while we're truncating, because we hold
> MultiXactTruncationLock. And nobody else should be getting here from
> looking up tuples, because if they are, we truncated too soon.

pg_get_multixact_members(), a concurrent call to SetMultiXactIdLimit()
(SetOffsetLimit()->find_multixact_start()) from vac_truncate_clog().

> Phew. That's all I see on a first read-through, but I've only spent a
> couple of hours on this, so I might easily have missed some things.
> But let me stop here, hit send, and see what you think of these
> comments.

Thanks for the look so far!

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-07-02 18:31:15 Re: Support for N synchronous standby servers - take 2
Previous Message Heikki Linnakangas 2015-07-02 18:22:52 Re: Time to fully remove heap_formtuple() and friends?