Skip site navigation (1) Skip section navigation (2)

Batch update of indexes on data loading

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Batch update of indexes on data loading
Date: 2008-02-21 04:26:20
Message-ID: 20080221130554.B2BA.52131E4D@oss.ntt.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackers
This is a proposal of fast data loading using batch update of indexes for 8.4.
It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and
I'd like to integrate it in order to cooperate with other parts of postgres.

The basic concept is spooling new coming data, and merge the spool and
the existing indexes into a new index at the end of data loading. It is 
5-10 times faster than index insertion per-row, that is the way in 8.3.


One of the problem is locking; Index building in bulkload is similar to
REINDEX rather than INSERT, so we need ACCESS EXCLUSIVE LOCK during it.
Bulkloading is not a upper compatible method, so I'm thinking about
adding a new "WITH LOCK" option for COPY command.

  COPY tbl FROM 'datafile' WITH LOCK;

If the LOCK option is specified, the behavior of COPY will be changed
as follows:

1. Lock the target table in ACCESS EXCLUSIVE mode instead of ROW EXCLUSIVE.
2. Prepare spooler (BTSpool) for each indexes.
3. For each new row, put index entries into the spools (_bt_spool)
   instead of index_insert.
4. At the end of COPY, merge the spool and the existing indexes into a new
   index file. The relfilenode of the index is changed like REINDEX.

However, there might be better interfaces for bulk index creation.
For example, if we want to use it with pgloader, we might need
"bulkload mode" for indexes. pgloader commits every 10000 rows,
so the index spooler must keep alive until end of the session
over transactions. (or end of the transaction over sub-transactions)

I'm working toward the simple "COPY WITH LOCK" approach for now,
but if there are other better ideas, I want to use them.
Advices and suggestions welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Responses

pgsql-hackers by date

Next:From: ITAGAKI TakahiroDate: 2008-02-21 04:35:23
Subject: Re: ANALYZE to be ignored by VACUUM
Previous:From: Joshua D. DrakeDate: 2008-02-21 04:07:37
Subject: Re: Permanent settings

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group