Re: pg_reorg in core?

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_reorg in core?
Date: 2012-09-21 03:07:26
Message-ID: CAK3UJRH+rGrDFsXoFwKD_iXpTgSXBcuHgJJHHxq-cpuhf3GjTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> During the last PGCon, I heard that some community members would be
> interested in having pg_reorg directly in core.

I'm actually not crazy about this idea, at least not given the current
state of pg_reorg. Right now, there are a quite a few fixes and
features which remain to be merged in to cvs head, but at least we can
develop pg_reorg on a schedule independent of Postgres itself, i.e. we
can release new features more often than once a year. Perhaps when
pg_reorg is more stable, and the known bugs and missing features have
been ironed out, we could think about integrating into core.

Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far.

> Just to recall, pg_reorg is a functionality developped by NTT that allows to
> redistribute a table without taking locks on it.
> The technique it uses to reorganize the table is to create a temporary copy
> of the table to be redistributed with a CREATE TABLE AS
> whose definition changes if table is redistributed with a VACUUM FULL or
> CLUSTER.
> Then it follows this mechanism:
> - triggers are created to redirect all the DMLs that occur on the table to
> an intermediate log table.

N.B. CREATE TRIGGER takes an AccessExclusiveLock on the table, see below.

> - creation of indexes on the temporary table based on what the user wishes
> - Apply the logs registered during the index creation
> - Swap the names of freshly created table and old table
> - Drop the useless objects
>
> The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
> I am also maintaining a fork in github in sync with pgfoundry here:
> https://github.com/michaelpq/pg_reorg.
>
> Just, do you guys think it is worth adding a functionality like pg_reorg in
> core or not?
>
> If yes, well I think the code of pg_reorg is going to need some
> modifications to make it more compatible with contrib modules using only
> EXTENSION.
> For the time being pg_reorg is divided into 2 parts, binary and library.
> The library part is the SQL portion of pg_reorg, containing a set of C
> functions that are called by the binary part. This has been extended to
> support CREATE EXTENSION recently.
> The binary part creates a command pg_reorg in charge of calling the set of
> functions created by the lib part, being just a wrapper of the library part
> to control the creation and deletion of the objects.
> It is also in charge of deleting the temporary objects by callback if an
> error occurs.
>
> By using the binary command, it is possible to reorganize a single table or
> a database, in this case reorganizing a database launches only a loop on
> each table of this database.
>
> My idea is to remove the binary part and to rely only on the library part to
> make pg_reorg a single extension with only system functions like other
> contrib modules.

> In order to do that what is missing is a function that could be used as an
> entry point for table reorganization, a function of the type
> pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
> All the functionalities of pg_reorg could be reproducible:
> - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
> - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has a
> CLUSTER key
> - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based on
> a wanted column.
>
> Is it worth the shot?

I haven't seen this documented as such, but AFAICT the reason that
pg_reorg is split into a binary and set of backend functions which are
called by the binary is that pg_reorg needs to be able to control its
steps in several transactions so as to avoid holding locks
excessively. The reorg_one_table() function uses four or five
transactions per table, in fact. If all the logic currently in the
pg_reorg binary were moved into backend functions, calling
pg_reorg_table() would have to be a single transaction, and there
would be no advantage to using such a function vs. CLUSTER or VACUUM
FULL.

Also, having a separate binary we should be able to perform some neat
tricks such as parallel index builds using multiple connections (I'm
messing around with this idea now). AFAIK this would also not be
possible if pg_reorg were contained solely in the library functions.

Josh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2012-09-21 03:09:14 Re: 64-bit API for large object
Previous Message Michael Paquier 2012-09-21 02:05:46 pg_reorg in core?