Re: Batch update of indexes on data loading

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Batch update of indexes on data loading
Date: 2008-02-22 00:57:48
Message-ID: 20080222094033.8B6B.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> > The basic concept is spooling new coming data, and merge the spool and
> > the existing indexes into a new index at the end of data loading. It is
> > 5-10 times faster than index insertion per-row, that is the way in 8.3.
>
> Please see
> http://thread.gmane.org/gmane.comp.db.postgresql.general/102370/focus=102901

Yeah, BEFORE INSERT FOR EACH ROW trigger is one of the problems.
I think it is enough to disallow bulkloading if there are any
BEFORE INSERT triggers. It is not a serious limitation because
DBA often disables triggers in bulkloading for performance.

>> You could work around this if the indexscan code knew to go search in the
>> list of pending insertions, but that's pretty ugly and possibly slow too.

I heard it is used in Falcon storage engine in MySQL, so it seems to be
not so unrealistic approach.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2008-02-22 02:11:06 Memory leaks on SRF rescan
Previous Message Andrew Dunstan 2008-02-21 20:15:48 Re: Including PL/PgSQL by default