Re: bg worker: general purpose requirements

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bg worker: general purpose requirements
Date: 2010-09-20 15:30:06
Message-ID: 4C977DFE.3050703@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 09/18/2010 05:21 AM, Robert Haas wrote:
> Wow, 100 processes??! Really? I guess I don't actually know how large
> modern proctables are, but on my MacOS X machine, for example, there
> are only 75 processes showing up right now in "ps auxww". My Fedora
> 12 machine has 97. That's including a PostgreSQL instance in the
> first case and an Apache instance in the second case. So 100 workers
> seems like a ton to me.

Well, Apache pre-forks 5 processes in total (by default, that is, for
high volume webservers a higher MinSpareServers setting is certainly not
out of question). While bgworkers currently needs to fork
min_spare_background_workers processes per database.

AIUI, that's the main problem with the current architecture.

>> I haven't measured the actual time it takes, but given the use case of a
>> connection pool, I so far thought it's obvious that this process takes too
>> long.
>
> Maybe that would be a worthwhile exercise...

On my laptop I'm measuring around 18 bgworker starts per second, i.e.
roughly 50 ms per bgworker start. That's certainly just a ball-park figure..

One could parallelize the communication channel between the coordinator
and postmaster, so as to be able to start multiple bgworkers in
parallel, but the initial latency remains.

It's certainly quick enough for autovacuum. But equally certainly not
acceptable for Postgres-R, where latency is the worst enemy in the first
place.

For autonomous transactions and parallel querying, I'd also say that I'd
rather not like to have such a latency.

> I think the kicker here is the idea of having a certain number of
> extra workers per database.

Agreed, but I don't see any better way. Short of a re-connecting feature.

> So
> if you knew you only had 1 database, keeping around 2 or 3 or 5 or
> even 10 workers might seem reasonable, but since you might have 1
> database or 1000 databases, it doesn't. Keeping 2 or 3 or 5 or 10
> workers TOTAL around could be reasonable, but not per-database. As
> Tom said upthread, we don't want to assume that we're the only thing
> running on the box and are therefore entitled to take up all the
> available memory/disk/process slots/whatever. And even if we DID feel
> so entitled, there could be hundreds of databases, and it certainly
> doesn't seem practical to keep 1000 workers around "just in case".

Agreed. Looks like Postgres-R has a slightly different focus, because if
you need multi-master replication, you probably don't have 1000s of
databases and/or lots of other services on the same machine.

> I don't know whether an idle Apache worker consumes more or less
> memory than an idle PostgreSQL worker, but another difference between
> the Apache case and the PostgreSQL case is that presumably all those
> backend processes have attached shared memory and have ProcArray
> slots. We know that code doesn't scale terribly well, especially in
> terms of taking snapshots, and that's one reason why high-volume
> PostgreSQL installations pretty much require a connection pooler. I
> think the sizes of the connection pools I've seen recommended are
> considerably smaller than 100, more like 2 * CPUs + spindles, or
> something like that. It seems like if you actually used all 100
> workers at the same time performance might be pretty awful.

Sounds reasonable, yes.

> I was taking a look at the Mammoth Replicator code this week
> (parenthetical note: I couldn't figure out where mcp_server was or how
> to set it up) and it apparently has a limitation that only one
> database in the cluster can be replicated. I'm a little fuzzy on how
> Mammoth works, but apparently this problem of scaling to large numbers
> of databases is not unique to Postgres-R.

Postgres-R is able to replicate multiple databases. Maybe not thousands,
but still designed for it.

> What is the granularity of replication? Per-database? Per-table?

Currently per-cluster (i.e. all your databases at once).

> How do you accumulate the change sets?

Logical changes get collected at the heapam level. They get serialized
and streamed (via imessages and a group communication system) to all
nodes. Application of change sets is highly parallelized and should be
pretty efficient. Commit ordering is decided by the GCS to guarantee
consistency across all nodes, conflicts get resolved by aborting the
later transaction.

> Some kind of bespoke hook, WAL scanning, ...?

No hooks, please! ;-)

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Colin 't Hart 2010-09-20 15:31:11 Re: What happened to the is_<type> family of functions proposal?
Previous Message Kevin Grittner 2010-09-20 15:12:43 Do we need a ShmList implementation?