Re: bg worker: patch 1 of 6 - permanent process

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bg worker: patch 1 of 6 - permanent process
Date: 2010-09-16 08:47:30
Message-ID: 4C91D9A2.9060304@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 09/15/2010 08:54 PM, Robert Haas wrote:
> I think that the bar for committing to another in-core replication
> solution right now is probably fairly high.

I'm not trying to convince you to accept the Postgres-R patch.. at least
not now.

<showing-off>
BTW, that'd be what I call a huge patch:

bgworkers, excluding dynshmem and imessages:
34 files changed, 2910 insertions(+), 1421 deletions(-)

from there to Postgres-R:
98 files changed, 14856 insertions(+), 230 deletions(-)
</showing-off>

> I am pretty doubtful that
> our current architecture is going to get us to the full feature set
> we'd eventually like to have - multi-master, partial replication, etc.

Would be hard to do, due to the (physical) format of WAL, yes. That's
why Postgres-R uses its own (logical) wire format.

> But we're not ever going to have ten replication solutions in core,
> so we need to think pretty carefully about what we accept.

That's very understandable.

> That
> conversation probably needs to start from the other end - is the
> overall architecture correct for us? - before we get down to specific
> patches. On the other hand, I'm very interested in laying the
> groundwork for parallel query

Cool. Maybe we should take another look at bgworkers, as soon as a
parallel querying feature gets planned?

> and I think there are probably a number
> of bits of architecture both from this project and Postgres-XC, that
> could be valuable contributions to PostgreSQL;

(...note that Postgres-R is license compatible, as opposed to the GPL'ed
Postgres-XC project...)

> however, in neither
> case do I expect them to be accepted without significant modification.

Sure, that's understandable as well. I've published this part of the
infrastructure to get some feedback as early as possible on that part of
Postgres-R.

As you can certainly imagine, it's important for me that any
modification to such a patch from Postgres-R would still be compatible
to what I use it for in Postgres-R and not cripple any functionality
there, because that'd probably create more work for me than not getting
the patch accepted upstream at all.

> I'm saying it's hard to think about committing any of them because
> they aren't really independent of each other or of other parts of
> Postgres-R.

As long as you don't consider imessages and dynshmem a part of
Postgres-R, they are independent of the rest of Postgres-R in the
technical sense.

And for any kind of parallel querying feature, imessages and dynshmem
might be of help as well. So I currently don't see where I could
de-couple these patches any further.

If you have a specific requirement, please don't hesitate to ask.

> I feel like there is an antagonistic thread to this conversation, and
> some others that we've had. I hope I'm misreading that, because it's
> not my intent to piss you off. I'm just offering my honest feedback.
> Your mileage may vary; others may feel differently; none of it is
> personal.

That's absolutely fine. I'm thankful for your feedback.

Also note that I initially didn't even want to add the bgworker patches
to the commit fest. I've de-coupled and published these separate from
Postgres-R with a) the hope to get feedback (more than for the overall
Postgres-R patch) and b) to show others that such a facility exists and
is ready to be reused.

I didn't really expect them to get accepted to Postgres core at the
moment. But the Postgres team normally asks for sharing concepts and
ideas as early as possible...

> OK, I think I understand what you're trying to say now. I guess I
> feel like the ideal architecture for any sort of solution that needs a
> pool of workers would be to keep around the workers that most recently
> proved to be useful. Upon needing a new worker, you look for one
> that's available and already bound to the correct database. If you
> find one, you assign him to the new task.

That's mostly how bgworkers are designed, yes. The min/max idle
background worker GUCs allow a loose control over how many spare
processes you want to allow hanging around doing nothing.

> If not, you find the one
> that's been idle longest and either (a) kill him off and start a new
> one that is bound to the correct database or, even better, (b) tell
> him to flush his caches and rebind to the correct database.

Hm.. sorry if I didn't express this more clearly. What I'm trying to say
is that (b) isn't worth implementing, because it doesn't offer enough of
an improvement over (a). The only saving would be the fork() and some
basic process initialization.

Being able to re-use a bgworker connected to the correct database
already gives you most of the benefit, namely not having to fork() *and*
re-connect to the database for every job.

Back at the technical issues, let me try to summarize the feedback and
what I do with it.

In general, there's not much use for bgworkers for just autovacuum as
the only background job. I agree.

Tom raised the 'lots of databases' issue. I agree that the bgworker
infrastructure isn't optimized for such a work load, but argue that it's
configurable to not hurt. If bgworkers ever gets accepted upstream, we'd
certainly need to discuss about reasonable defaults for the relevant
GUCs. Additionally, more cleverness about when to start or stop (spare)
workers from the coordinator couldn't hurt.

I had a lengthy discussion with Dimitri about whether or not bgworkers
could help him with some kind of PgQ daemon. I think we now agree that
bgworkers isn't the right tool for that job.

You are questioning, whether the min_idle_bgworkers GUC is really
necessary. I'm arguing that it is necessary in Postgres-R to cover load
spikes, because starting bgworkers is slow.

So, overall, I now got quite a bit of feedback. There doesn't seem to be
any stumbling block in the general design of bgworkers. So I'll happily
continue to use (and refine) bgworkers for Postgres-R. And I'm looking
forward to more discussions once parallel querying gets more serious
attention.

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-09-16 09:02:49 Re: TODO note
Previous Message Vaibhav Kaushal 2010-09-16 08:04:26 Introducing Myself