Re: Postgres-R: internal messaging

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Markus Wanner <markus(at)bluegap(dot)ch>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres-R: internal messaging
Date: 2008-07-23 10:31:31
Message-ID: 48870883.9040807@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexey,

thanks for your feedback, these are interesting points.

Alexey Klyukin wrote:
> In Replicator we avoided the need for postmaster to read/write backend's
> shmem data by using it as a signal forwarder. When a backend wants to
> inform a special process (i.e. queue monitor) about replication-related
> event (such as commit) it sends SIGUSR1 to Postmaster with a related
> "reason" flag and the postmaster upon receiving this signal forwards
> it to the destination process. Termination of backends and special
> processes are handled by the postmaster itself.

Hm.. how about larger data chunks, like change sets? In Postgres-R,
those need to travel between the backends and the replication manager,
which then sends it to the GCS.

> Hm...what would happen with the new data under heavy load when the queue
> would eventually be filled with messages, the relevant transactions
> would be aborted or they would wait for the manager to release the queue
> space occupied by already processed messages? ISTM that having a fixed
> size buffer limits the maximum transaction rate.

That's why the replication manager is a very simple forwarder, which
does not block messages, but consumes them immediately from shared
memory. It already features a message cache, which holds messages it
cannot currently forward to a backend, because all backends are busy.

And it takes care to only send change sets to helper backend which are
not busy and can consume the process the remote transaction immediately.
That way, I don't think the limit on shared memory is the bottleneck.
However, I didn't measure.

WRT waiting vs aborting: I think at the moment I don't handle this
situation gracefully. I've never encountered it. ;-) But I think the
simpler option is letting the sender wait until there is enough room in
the queue for its message. To avoid deadlocks, each process should
consume its messages, before trying to send one. (Which is done
correctly only for the replication manager ATM, not for the backends, IIRC).

> What about keeping the per-process message queue in the local memory of
> the process, and exporting only the queue head to the shmem, thus having
> only one message per-process there.

The replication manager already does that with its cache. No other
process needs to send (large enough) messages which cannot be consumed
immediately. So such a local cache does not make much sense for any
other process.

Even for the replication manager, I find it dubious to require such a
cache, because it introduces an unnecessary copying of data within memory.

> When the queue manager gets a
> message from the process it may signal that process to copy the next
> message from the process local memory into the shmem. To keep a
> correct ordering of queue messages an additional shared memory queue of
> pid_t can be maintained, containing one pid per each message.

The replication manager takes care of the ordering for cached messages.

Regards

Markus Wanner

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-07-23 10:59:43 Re: [PATCHES] odd output in restore mode
Previous Message Abhijit Menon-Sen 2008-07-23 09:56:43 Re: [PATCH] "\ef <function>" in psql