CONCURRENT INDEXing again (was: Must be owner to truncate?)

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Mike Mascari <mascarm(at)mascari(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: CONCURRENT INDEXing again (was: Must be owner to truncate?)
Date: 2005-07-12 06:53:11
Message-ID: 1121151191.4873.29.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On L, 2005-07-09 at 16:50 -0400, Alvaro Herrera wrote:
> On Sat, Jul 09, 2005 at 11:43:52PM +0300, Hannu Krosing wrote:
>
> > Could the new file not be made to cover the next available 1GB of file
> > space, that is a new physical file ?
> >
> > This could made using of same kind of machinery my proposal for
> > concurrent index does (i.e. locks that forbid putting new tuples in
> > certain tuple ranges)
>
> I think your proposals are too "handwavy", but there is a similar
> mechanism outlined for on-line index reorganizarion, whereby new tuples
> can be inserted concurrently with the reorganization, being stored on a
> "spill area". See

I try to state my reworked idea of concurrent indexing in a more clear
way:

The index build in done 2 transactions, need one new type of lock and a
new system column in pg_class to tell planner not to use an incomplete
index. This similar to vacuum in thet ot needs its own transactions and
is not rollbackable. Perhaps the decision to use either this or current
INDEX should be based on weather (RE)INDEX command is run in its own
transaction.

1st transaction:
----------------

The index for the part of datafile that exists at the start of INDEX
command, is created within the 1st transacton, in similar way we do now.
"the part" is definded as all tuples with ctid below (<=) the max(ctid)
stored at the start as MAX_CTID_1.

To be sure that we cover all the tuples in range <= MAX_CTID_1, and no
new tuples are stored there as the result of INSERT or UPDATE, we need a
new type of lock (lets call it "TupleStoreRangeLock"), which prevents
new tuples to be placed below MAX_CTID_1 and which is aquired before
starting the initial build.

After the initial build of index for tuples below MAX_CTID_1 is
finished, it is made visible to the rest of the system by committing the
transaction, but marking the index as "incomplete" (probably a new
column in pg_class is needed for that), so that it will not be used by
planner, but all new inerts/updates will see and use it.

2nd transaction
---------------

After that we need to wait for all other running transactions to
complete, so we can be sure that all other backends know about the new
index.

Once we are sure they do, we record the new max(ctid) as MAX_CTID_2. At
this point we can release the TupleStoreRangeLock, to make it possible
to place new tuples below MAX_CTID_1

As the final step we need to scan all tuples in range ( MAX_CTID_1 to
MAX_CTID_2 ) and insert their corresponding index entries into the new
index. If the entry already exists for exact same ctid, that is ok.

After reaching MAX_CTID_2, the index is ready for use by planner as well
and can be marked as "complete" in pg_class. In case of REINDEX, the new
index can be made to replace the old one at this point.

TODO: work out with which locks TupleStoreRangeLock conflicts and with
which it can coexists.

--
Hannu Krosing <hannu(at)skype(dot)net>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junji TERAMOTO 2005-07-12 07:01:55 Re: Quick-and-dirty compression for WAL backup blocks
Previous Message Simon Riggs 2005-07-12 06:47:42 Re: Vacuum summary?