Re: autovacuum process handling

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum process handling
Date: 2007-01-26 14:36:37
Message-ID: 20070126143637.GF13036@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Markus Schiltknecht wrote:
> Hi,
>
> Alvaro Herrera wrote:
> >Yeah. For what I need, the launcher just needs to know when a worker
> >has finished and how many workers there are.
>
> Oh, so it's not all that less communication. My replication manager also
> needs to know when a worker dies. You said you are using a signal from
> manager to postmaster to request a worker to be forked. How do you do
> the other part, where the postmaster needs to tell the launcher which
> worker terminated?

I haven't done that yet, since the current incarnation does not need it.
But I have considered using some signal like SIGUSR1 to mean "something
changed in your processes, look into your shared memory". The
autovacuum shared memory area would contain PIDs (or maybe PGPROC
pointers?) of workers; so when the launcher goes to check that it
notices that one worker is no longer there, meaning that it must have
terminated its job.

> >>For Postgres-R, I'm currently questioning if I shouldn't merge the
> >>replication manager process with the postmaster. Of course, that would
> >>violate the "postmaster does not touch shared memory" constraint.
> >
> >I suggest you don't. Reliability from Postmaster is very important.
>
> Yes, so? As long as I can't restart the replication manager, but
> operation of the whole DBMS relies on it, I have to take the postmaster
> dows as soon as it detects a crashed replication manager.

Sure. But you also need to take down all regular backends, and bgwriter
as well. If the postmaster just dies, this won't work cleanly.

> That's why I'm questioning, if that's the behavior we want. Isn't it
> better to force the administrators to look into the issue and probably
> replace a broken node instead of having one node going amok by
> requesting recovery over and over again, possibly forcing crashes of
> other nodes, too, because of the additional load for recovery?

Maybe what you want, then, is that when the replication manager dies,
then the postmaster should close all processes and then shut itself
down. This also can be arranged easily.

But just crashing the postmaster because the manager sees something
wrong is certainly not a good idea.

> >Well, the point of the postmaster is that it can notice when one process
> >dies and take appropriate action. When a backend dies, the postmaster
> >closes all others. But if the postmaster crashes due to a bug in the
> >manager (due to both being integrated in a single process), how do you
> >close the backends? There's no one to do it.

> That's a point.
>
> But again, as long as the replication manager won't be able to restart,
> you gain nothing by closing backends on a crashed node.

Sure you do -- they won't corrupt anything :-) Plus, what use are
running backends in a multimaster environment, if they can't communicate
with the outside? Much better would be, AFAICS, to shut everyone down
so that the users can connect to a working node.

> >I guess your problem is that the manager's task is quite a lot more
> >involved than my launcher's. But in that case, it's even more important
> >to have them separate.
>
> More involved with what? It does not touch shared memory, it mainly
> keeps track of the backends states (by getting a notice from the
> postmaster) and does all the necessary forwarding of messages between
> the communication system and the backends. It's main loop is similar to
> the postmasters, mainly consisting of a select().

I meant "more complicated". And if it has to listen on a socket and
forward messages to remote backends, it certainly is a lot more
complicated than the current autovac launcher.

> >I don't understand why the manager talks to postmaster. If it doesn't,
> >well, then there's no concurrency issue gone, because the remote
> >backends will be talking to *somebody* anyway; be it postmaster, or
> >manager.
>
> As with your launcher, I only send one message: the worker request. But
> the other way around, from the postmaster to the replication manager,
> there are also some messages: a "database is ready" message and a
> "worker terminated" messages. Thinking about handling the restarting
> cycle, I would need to add a "database is restarting" messages, which
> has to be followed by another "database is ready" message.
>
> For sure, the replication manager needs to keep running during a
> restarting cycle. And it needs to know the database's state, so as to be
> able to decide if it can request workers or not.

I think this would be pretty easy to do if you made the remote backends
keep state in shared memory. The manager just needs to get a signal to
know that it should check the shared memory. This can be arranged
easily: just have the remote backends signal the postmaster, and have
the postmaster signal the manager. Alternatively, have the manager PID
stored in shared memory and have the remote backends signal (SIGUSR1 or
some such) the manager. (bgwriter does this: it announces its PID in
shared memory, and the backends signal it when they want a CHECKPOINT).

> >I think you're underestimating the postmaster's task.
>
> Maybe, but it certainly looses importance within a cluster, since it
> controls only part of the whole database system.

Well, IMVHO a single node's reliability is important to the overall
cluster, because the node can reject further incoming input from clients
when it gets out of the cluster. If you allow the regular backends to
continue working, some of the writes they do on that node could be lost.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2007-01-26 14:38:49 Re: Proposal: Commit timestamp
Previous Message Simon Riggs 2007-01-26 14:36:11 Re: Proposal: Snapshot cloning