From: | Christopher Browne <cbbrowne(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: 9.3 feature proposal: vacuumdb -j # |
Date: | 2012-01-13 22:15:48 |
Message-ID: | CAFNqd5U71fpJ28Xy7EvUXqAcV1ghom+XpPcW3HcTDTwSO9Xkeg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 13, 2012 at 4:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> It occurs to me that I would find it quite personally useful if the
> vacuumdb utility was multiprocess capable.
>
> For example, just today I needed to manually analyze a database with
> over 500 tables, on a server with 24 cores. And I needed to know when
> the analyze was done, because it was part of a downtime. I had to
> resort to a python script.
>
> I'm picturing doing this in the simplest way possible: get the list of
> tables and indexes, divide them by the number of processes, and give
> each child process its own list.
I think "simplest" isn't *quite* best...
There's the risk that all the big tables get tied to one child, and so
the one child is doing them serially.
Better:
Have two logical tasks:
a) A process that manages the list, and
b) Child processes doing vacuums.
Each time a child completes a table, it asks the parent for another one.
So the tendency will be that if there are 8 big tables, and 12 child
processes, it's *certain* that the 8 big tables will be spread across
the children.
It guarantees that the child processes will all be busy until there
are fewer tables left than there are child processes.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2012-01-13 22:16:04 | Re: 9.3 feature proposal: vacuumdb -j # |
Previous Message | Euler Taveira de Oliveira | 2012-01-13 22:12:35 | Re: 9.3 feature proposal: vacuumdb -j # |