Quick Links

Re: Benchmark Data requested

From:	"Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-performance(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>
Subject:	Re: Benchmark Data requested
Date:	2008-02-05 20:50:35
Message-ID:	47A8CC1B.1050000@sun.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi Heikki,

Is there a way such an operation can be spawned as a worker process?
Generally during such loading - which most people will do during
"offpeak" hours I expect additional CPU resources available. By
delegating such additional work to worker processes, we should be able
to capitalize on additional cores in the system.

Even if it is a single core, the mere fact that the loading process will
eventually wait for a read from the input file which cannot be
non-blocking, the OS can timeslice it well for the second process to use
those wait times for the index population work.

What do you think?

Regards,
Jignesh

Heikki Linnakangas wrote:
> Dimitri Fontaine wrote:
>> Le mardi 05 février 2008, Simon Riggs a écrit :
>>> I'll look at COPY FROM internals to make this faster. I'm looking at
>>> this now to refresh my memory; I already had some plans on the shelf.
>>
>> Maybe stealing some ideas from pg_bulkload could somewhat help here?
>>
>> http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf
>>
>>
>> IIRC it's mainly about how to optimize index updating while loading
>> data, and I've heard complaints on the line "this external tool has
>> to know too much about PostgreSQL internals to be trustworthy as
>> non-core code"... so...
>
> I've been thinking of looking into that as well. The basic trick
> pg_bulkload is using is to populate the index as the data is being
> loaded. There's no fundamental reason why we couldn't do that
> internally in COPY. Triggers or constraints that access the table
> being loaded would make it impossible, but we should be able to detect
> that and fall back to what we have now.
>
> What I'm basically thinking about is to modify the indexam API of
> building a new index, so that COPY would feed the tuples to the
> indexam, instead of the indexam opening and scanning the heap. The
> b-tree indexam would spool the tuples into a tuplesort as the COPY
> progresses, and build the index from that at the end as usual.
>

In response to

Re: Benchmark Data requested at 2008-02-05 20:06:17 from Heikki Linnakangas

Responses

Re: Benchmark Data requested at 2008-02-05 21:45:52 from Heikki Linnakangas
Re: Benchmark Data requested at 2008-02-05 22:00:03 from Simon Riggs

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2008-02-05 21:45:52	Re: Benchmark Data requested
Previous Message	Jignesh K. Shah	2008-02-05 20:45:33	Re: Benchmark Data requested