Quick Links

Re: Batch update of indexes on data loading

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Batch update of indexes on data loading
Date:	2008-02-28 07:14:23
Message-ID:	20080228145814.5F49.52131E4D@oss.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> The LOCK is only required because we defer the inserts into unique
> indexes, yes?

No, as far as present pg_bulkload. It creates a new relfilenode like REINDEX,
therefore, access exclusive lock is needed. When there is violations of
unique constraints, all of the loading is rollbacked at the end of loading.

BTW, why REINDEX requires access exclusive lock? Read-only queries
are forbidden during the operation now, but I feel they are ok
because REINDEX only reads existing tuples. Can we do REINDEX
holding only shared lock on the index?

> I very much like the idea of index merging, or put another way: batch
> index inserts. How big do the batch of index inserts have to be for us
> to gain benefit from this technique?

Hmm, we might need to know *why* COPY with indexes is slow. If the major
cause is searching position to insert, batch inserts will work well.
However, if the cause is index splitting and following random i/o,
batch insertion cannot solve the problem; "rebuild" is still required.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Re: Batch update of indexes on data loading at 2008-02-26 09:08:37 from Simon Riggs

Responses

Re: Batch update of indexes on data loading at 2008-02-29 03:26:05 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	ITAGAKI Takahiro	2008-02-28 08:46:04	Logging conflicted queries on deadlocks
Previous Message	Babu, Gabriel Suresh	2008-02-28 07:02:28	ES7000 Windows 2003 server 64bit processor