Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2015-01-22 17:46:01
Message-ID: 20150122174601.GB1663@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's v23.

I reworked a number of things. First, I changed trivial stuff like
grouping all the vacuuming options in a struct, to avoid passing an
excessive number of arguments to functions. full, freeze, analyze_only,
and_analyze and verbose are all in a single struct now. Also, the
stage_commands and stage_messages was duplicated by your patch; I moved
them to a file-level static struct.

I made prepare_command reset the string buffer and receive an optional
table name, so that it can append it to the generated command, and
append the semicolon as well. Forcing the callers to reset the string
before calling, and having them add the table name and semicolon
afterwards was awkward and unnecessarily verbose.

You had a new in_abort() function in common.c which seems an unnecessary
layer; in its place I just exported the inAbort boolean flag it was
returning, and renamed to CancelRequested.

I was then troubled by the fact that vacuum_one_database() was being
called in a loop by main() when multiple tables are vacuumed, but
vacuum_parallel() was doing the loop internally. I found this
discrepancy confusing, so I renamed that new function to
vacuum_one_database_parallel and modified the original
vacuum_one_database to do the loop internally as well. Now they are, in
essence, a mirror of each other, one doing the parallel stuff and one
doing it serially. This seems to make more sense to me -- but see
below.

I also modified some underlying stuff like GetIdleSlot returning a
ParallelSlot pointer instead of an array index. Since its caller always
has to dereference the array with the given index, it makes more sense
to return the right element pointer instead, so I made it do that.
Also, that way, instead of returning NO_SLOT in case of error it can
just return NULL; no need for extra cognitive burden.

I also changed select_loop. In your patch it had two implementations,
one WIN32 and another one for the rest. It looks nicer to me to have
only one with small exceptions in the places that need it. (I haven't
tested the WIN32 path.) Also, instead of returning ERROR_IN_ABORT I
made it set a boolean flag in case of error, which seems cleaner.

I changed GetQueryResult as I described in a previous message.

There are two things that continue to bother me and I would like you,
dear patch author, to change them before committing this patch:

1. I don't like having vacuum_one_database() and a separate
vacuum_one_database_parallel(). I think we should merge them into one
function, which does either thing according to parameters. There's
plenty in there that's duplicated.

2. in particular, the above means that run_parallel_vacuum can no longer
exist as it is. Right now vacuum_one_database_parallel relies on
run_parallel_vacuum to do the actual job parallellization. I would like
to have that looping in the improved vacuum_one_database() function
instead.

Looking forward to v24,

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
vacuumdb_parallel_v23.patch text/x-diff 32.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-22 17:55:37 Re: Windows buildfarm animals are still not happy with abbreviated keys patch
Previous Message Robert Haas 2015-01-22 17:27:22 Re: Merging postgresql.conf and postgresql.auto.conf