Re: Batch update of indexes on data loading

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Batch update of indexes on data loading
Date: 2008-02-26 06:19:47
Message-ID: 20080226135216.60CF.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> One of the reasons why I hadn't wanted to pursue earlier ideas to use
> LOCK was that applying a lock will prevent running in parallel, which
> ultimately may prevent further performance gains.
>
> Is there a way of doing this that will allow multiple concurrent COPYs?

I think there is same difficulty as parallel queries. It requires tighter
communication among COPY threads whether we will use multi-process model
or multi-thread model.

We have independent concurrent COPYs now; COPYs are not aware of each
other because no intermediate status during COPY. However, COPY will
have "phases" if we use bulkbuild. Therefore, we will need joining
COPY threads and passing each working memories between threads.

Here is a possible multi-threaded workload:

A. For each row:
1. Parsing new coming data
2. Add the row into the heap.
3. Spool index entries to each index spooler.
B. Wait for all threads.
C. Merge spools and corresponding existing indexes into new ones.

Phase A could be concurrently as same as now. A1 and A2 are independent
jobs. We could have shared spooler or per-thread spooler.
Phase B is needed to build indexes at once, or it will be double work.
Phase C could be concurrently for each indexes. A thread is responsible
to build one index. It merges the existing index and one shared spool
or multiple spools if we use per-thread spooler.

One of the issues is how to pass or share spoolers between COPY threads.
Another is how to make it transaction safe. If one of the thread fails to
build its index, all thread should be rollback.
I'm not sure how to do them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-02-26 06:25:09 Re: 8.3 / 8.2.6 restore comparison
Previous Message Tom Lane 2008-02-26 05:39:29 Re: pg_dump additional options for performance