Re: autovacuum next steps, take 2

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>
Subject: Re: autovacuum next steps, take 2
Date: 2007-02-27 02:36:10
Message-ID: 45E3991A.3030605@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Matthew T. O'Connor wrote:
>>> I'm not sure it's a good idea to tie this to the vacuum cost delay
>>> settings either, so let me as you this, how is this better than just
>>> allowing the admin to set a new GUC variable like
>>> autovacuum_hot_table_size_threshold (or something shorter) which we can
>>> assign a decent default of say 8MB.
>
>> Yeah, maybe that's better -- it's certainly simpler.
>
> I'm not liking any of these very much, as they seem critically dependent
> on impossible-to-tune parameters. I think it'd be better to design this
> around having the first worker explicitly expose its state (list of
> tables to process, in order) and having subsequent workers key off that
> info. The shared memory state could include the OID of the table each
> worker is currently working on, and we could keep the to-do list in some
> simple flat file for instance (since we don't care about crash safety).

So far we are only talking about one parameter, the
hot_table_size_threshold, which I agree would be a guess by an admin,
but if we went in this direction, I would also advocate adding a column
to the pg_autovacuum table that allows an admin to explicitly define a
table as hot or not.

Also I think each worker should be mostly independent, the only caveat
being that (assuming each worker works in size order) if we catch up to
an older worker (get to the table they are currently working on) we
exit. Personally I think this is all we need, but others felt the
additional threshold was needed. What do you think? Or what do you
think might be better?

> I'm not certain exactly what "key off" needs to mean; perhaps each
> worker should make its own to-do list and then discard items that are
> either in-progress or recently done by another worker when it gets to
> them.

My initial design didn't have any threshold at all, but others felt this
would/could result in too many worker working concurrently in the same DB.

> I think an absolute minimum requirement for a sane design is that no two
> workers ever try to vacuum the same table concurrently, and I don't see
> where that behavior will emerge from your proposal; whereas it's fairly
> easy to make it happen if non-first workers pay attention to what other
> workers are doing.

Maybe we never made that clear, I was always working on the assumption
that two workers would never try to work on the same table at the same time.

> BTW, it's probably necessary to treat shared catalogs specially ...

Certainly.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2007-02-27 02:37:34 Re: autovacuum next steps, take 2
Previous Message Joshua D. Drake 2007-02-27 02:25:51 Re: Seeking Google SoC Mentors