Re: Batch update of indexes on data loading

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Batch update of indexes on data loading
Date: 2008-02-24 08:14:24
Message-ID: 1203840864.4247.2.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2008-02-21 at 13:26 +0900, ITAGAKI Takahiro wrote:
> This is a proposal of fast data loading using batch update of indexes for 8.4.
> It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and
> I'd like to integrate it in order to cooperate with other parts of postgres.
>
> The basic concept is spooling new coming data, and merge the spool and
> the existing indexes into a new index at the end of data loading. It is
> 5-10 times faster than index insertion per-row, that is the way in 8.3.
>
>
> One of the problem is locking; Index building in bulkload is similar to
> REINDEX rather than INSERT, so we need ACCESS EXCLUSIVE LOCK during it.
> Bulkloading is not a upper compatible method, so I'm thinking about
> adding a new "WITH LOCK" option for COPY command.
>
> COPY tbl FROM 'datafile' WITH LOCK;
>

I'm very excited to see these concepts going into COPY.

One of the reasons why I hadn't wanted to pursue earlier ideas to use
LOCK was that applying a lock will prevent running in parallel, which
ultimately may prevent further performance gains.

Is there a way of doing this that will allow multiple concurrent COPYs?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-02-24 09:39:42 Re: 8.3 / 8.2.6 restore comparison
Previous Message Luke Lonergan 2008-02-24 01:46:40 Re: CopyReadLineText optimization