Re: Benchmark Data requested

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Cc: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, <pgsql-performance(at)postgresql(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>
Subject: Re: Benchmark Data requested
Date: 2008-02-05 21:45:52
Message-ID: 47A8D910.1030607@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jignesh K. Shah wrote:
> Is there a way such an operation can be spawned as a worker process?
> Generally during such loading - which most people will do during
> "offpeak" hours I expect additional CPU resources available. By
> delegating such additional work to worker processes, we should be able
> to capitalize on additional cores in the system.

Hmm. You do need access to shared memory, locks, catalogs, and to run
functions etc, so I don't think it's significantly easier than using
multiple cores for COPY itself.

> Even if it is a single core, the mere fact that the loading process will
> eventually wait for a read from the input file which cannot be
> non-blocking, the OS can timeslice it well for the second process to use
> those wait times for the index population work.

That's an interesting point.

> What do you think?
>
>
> Regards,
> Jignesh
>
>
> Heikki Linnakangas wrote:
>> Dimitri Fontaine wrote:
>>> Le mardi 05 février 2008, Simon Riggs a écrit :
>>>> I'll look at COPY FROM internals to make this faster. I'm looking at
>>>> this now to refresh my memory; I already had some plans on the shelf.
>>>
>>> Maybe stealing some ideas from pg_bulkload could somewhat help here?
>>>
>>> http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf
>>>
>>>
>>> IIRC it's mainly about how to optimize index updating while loading
>>> data, and I've heard complaints on the line "this external tool has
>>> to know too much about PostgreSQL internals to be trustworthy as
>>> non-core code"... so...
>>
>> I've been thinking of looking into that as well. The basic trick
>> pg_bulkload is using is to populate the index as the data is being
>> loaded. There's no fundamental reason why we couldn't do that
>> internally in COPY. Triggers or constraints that access the table
>> being loaded would make it impossible, but we should be able to detect
>> that and fall back to what we have now.
>>
>> What I'm basically thinking about is to modify the indexam API of
>> building a new index, so that COPY would feed the tuples to the
>> indexam, instead of the indexam opening and scanning the heap. The
>> b-tree indexam would spool the tuples into a tuplesort as the COPY
>> progresses, and build the index from that at the end as usual.
>>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2008-02-05 22:00:03 Re: Benchmark Data requested
Previous Message Jignesh K. Shah 2008-02-05 20:50:35 Re: Benchmark Data requested