Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2014-07-15 13:31:02
Message-ID: CABUevEzXwDLtR_totdjbKrHgi55GXfgpVFk_N8CqT+zJ+DW-3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com> wrote:
> On 01 July 2014 03:48, Alvaro Wrote,
>
>> > In particular, pgpipe is almost an exact duplicate between them,
>> > except the copy in vac_parallel.c has fallen behind changes made to
>> > parallel.c. (Those changes would have fixed the Windows warnings).
>> I
>> > think that this function (and perhaps other parts as
>> > well--"exit_horribly" for example) need to refactored into a common
>> > file that both files can include. I don't know where the best place
>> > for that would be, though. (I haven't done this type of refactoring
>> > myself.)
>>
>> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos.
>> Maybe we should move pgpipe back to src/port and have pg_dump and this
>> new thing use that. I'm not sure about the rest of duplication in
>> vac_parallel.c; there might be a lot in common with what
>> pg_dump/parallel.c does too. Having two copies of code is frowned upon
>> for good reasons. This patch introduces 1200 lines of new code in
>> vac_parallel.c, ugh.
>
>>
>> If we really require 1200 lines to get parallel vacuum working for
>> vacuumdb, I would question the wisdom of this effort. To me, it seems
>> better spent improving autovacuum to cover whatever it is that this
>> patch is supposed to be good for --- or maybe just enable having a
>> shell script that launches multiple vacuumdb instances in parallel ...
>
> Thanks for looking into the patch,
>
> I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task,
> If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as good as one process is doing operation.
>
> In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table, this way all process get equal sharing of the task.

I am late to this game, but the first thing to my mind was - do we
really need the whole forking/threading thing on the client at all? We
need it for things like pg_dump/pg_restore because they can themselvse
benefit from parallelism at the client level, but for something like
this, might the code become a lot simpler if we just use multiple
database connections and async queries? That would also bring the
benefit of less platform dependent code, less cleanup needs etc.

(Oh, and for some reason at my quick review i also noticed - you added
quoting of the table name, but forgot to do it for the schema name.
You should probably also look at using something like
quote_identifier(), that'll make things easier).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-07-15 13:41:08 pgsql: Reset master xmin when hot_standby_feedback disabled.
Previous Message Magnus Hagander 2014-07-15 13:12:54 Re: SSL compression info in psql header