Re: autovacuum process handling

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum process handling
Date: 2007-01-26 10:02:20
Message-ID: 45B9D1AC.9080804@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Alvaro Herrera wrote:
> Yeah. For what I need, the launcher just needs to know when a worker
> has finished and how many workers there are.

Oh, so it's not all that less communication. My replication manager also
needs to know when a worker dies. You said you are using a signal from
manager to postmaster to request a worker to be forked. How do you do
the other part, where the postmaster needs to tell the launcher which
worker terminated?

>> For Postgres-R, I'm currently questioning if I shouldn't merge the
>> replication manager process with the postmaster. Of course, that would
>> violate the "postmaster does not touch shared memory" constraint.
>
> I suggest you don't. Reliability from Postmaster is very important.

Yes, so? As long as I can't restart the replication manager, but
operation of the whole DBMS relies on it, I have to take the postmaster
dows as soon as it detects a crashed replication manager.

So I still argue that reliability is getting better than status quo, if
I'm merging these two processes (because of less code for communication
between the two).

Of course, the other way to gain reliability would be to make the
replication manager restartable. But restarting the replication manager
means recovering data from other nodes in the cluster, thus a lot of
network traffic. Needless to say, this is quite an expensive operation.

That's why I'm questioning, if that's the behavior we want. Isn't it
better to force the administrators to look into the issue and probably
replace a broken node instead of having one node going amok by
requesting recovery over and over again, possibly forcing crashes of
other nodes, too, because of the additional load for recovery?

>> But it would make some things a lot easier:
>>
>> * What if the launcher/manager dies (but you potentially still have
>> active workers)?
>>
>> Maybe, for autovacuum you can simply restart the launcher and that
>> one detects workers from shmem.
>>
>> With replication, I certainly have to take down the postmaster as
>> well, as we are certainly out of sync and can't simply restart the
>> replication manager. So in that case, no postmaster can run without a
>> replication manager and vice versa. Why not make it one single
>> process, then?
>
> Well, the point of the postmaster is that it can notice when one process
> dies and take appropriate action. When a backend dies, the postmaster
> closes all others. But if the postmaster crashes due to a bug in the
> manager (due to both being integrated in a single process), how do you
> close the backends? There's no one to do it.

That's a point.

But again, as long as the replication manager won't be able to restart,
you gain nothing by closing backends on a crashed node.

> In my case, the launcher is not critical. It can die and the postmaster
> should just start a new one without much noise. A worker is critical
> because it's connected to tables; it's as critical as a regular backend.
> So if a worker dies, the postmaster must take everyone down and cause a
> restart. This is pretty easy to do.

Yeah, that's the main difference, and I see why your approach makes
perfect sense for the autovacuum case.

In contrast, the replication manager is critical (to one node), and a
restart is expensive (for the whole cluster).

>> * Startup races: depending on how you start workers, the launcher/
>> manager may get a "database is starting up" error when requesting
>> the postmaster to fork backends.
>> That probably also applies to autovacuum, as those workers shouldn't
>> work concurrently to a startup process. But maybe there are other
>> means of ensuring that no autovacuum gets triggered during startup?
>
> Oh, this is very easy as well. In my case the launcher just sets a
> database OID to be processed in shared memory, and then calls
> SendPostmasterSignal with a particular value. The postmaster must only
> check this signal within ServerLoop, which means it won't act on it
> (i.e., won't start a worker) until the startup process has finished.

It seems like your launcher is perfectly fine with requesting workers
and not getting them. The replication manager currently isn't. Maybe I
should make it more fault tolerant in that regard...

> I guess your problem is that the manager's task is quite a lot more
> involved than my launcher's. But in that case, it's even more important
> to have them separate.

More involved with what? It does not touch shared memory, it mainly
keeps track of the backends states (by getting a notice from the
postmaster) and does all the necessary forwarding of messages between
the communication system and the backends. It's main loop is similar to
the postmasters, mainly consisting of a select().

> I don't understand why the manager talks to postmaster. If it doesn't,
> well, then there's no concurrency issue gone, because the remote
> backends will be talking to *somebody* anyway; be it postmaster, or
> manager.

As with your launcher, I only send one message: the worker request. But
the other way around, from the postmaster to the replication manager,
there are also some messages: a "database is ready" message and a
"worker terminated" messages. Thinking about handling the restarting
cycle, I would need to add a "database is restarting" messages, which
has to be followed by another "database is ready" message.

For sure, the replication manager needs to keep running during a
restarting cycle. And it needs to know the database's state, so as to be
able to decide if it can request workers or not.

> (Maybe your problem is that the manager is not correctly designed. We
> can talk about checking that code. I happen to know the Postmaster
> process handling code because of my previous work with Autovacuum and
> because of Mammoth Replicator.)

Thanks for the offer, I'll get back to that.

> I think you're underestimating the postmaster's task.

Maybe, but it certainly looses importance within a cluster, since it
controls only part of the whole database system.

> Ok. I have one ready, and it works very well. It only ever starts one
> worker -- I have constrained that way just to keep the current behavior
> of a single autovacuum process running at any time. My plan is to get
> it submitted for review, and then start working on having it consider
> multiple workers and introduce more scheduling smarts.

Sounds like a good plan.

Thank you for your inputs. You made me rethink some issues and pointed
me to some open questions.

Regards

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message BluDes 2007-01-26 10:22:37 PostgreSQL Data Loss
Previous Message Heikki Linnakangas 2007-01-26 09:31:46 Re: Piggybacking vacuum I/O