Re: COPY with hints, rebirth

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-01 16:40:59
Message-ID: CA+U5nMLM5MVXmt-gW=kM_evyckxxFKNFSLaLLMvZ-TTs3MrkNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 24.02.2012 22:55, Simon Riggs wrote:
>>
>> A long time ago, in a galaxy far away, we discussed ways to speed up
>> data loads/COPY.
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>>
>> In particular, the idea that we could mark tuples as committed while
>> we are still loading them, to avoid negative behaviour for the first
>> reader.
>>
>> Simple patch to implement this is attached, together with test case.
>>
>>  ...
>>
>>
>> What exactly does it do? Previously, we optimised COPY when it was
>> loading data into a newly created table or a freshly truncated table.
>> This patch extends that and actually sets the tuple header flag as
>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>> code. The patch also adds some tests for corner cases that would make
>> that action break MVCC - though those cases are minor and typical data
>> loads will benefit fully from this.
>
>
> This doesn't work with subtransactions:
...
> The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

>> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
>> and adding current xid to snapshots. That is an invasive change that I
>> would wish to avoid at any time and explains the long delay in
>> tackling this. The way I've implemented it, is just as a short test
>> during XidInMVCCSnapshot() so that we trap the case when the xid ==
>> xmax and so would appear to be running. This is much less invasive and
>> just as performant as Tom's original suggestion.
>
>
> TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
> lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_nohints.v358.patch text/x-diff 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-03-01 17:00:36 Re: pgsql_fdw, FDW for PostgreSQL server
Previous Message Robert Haas 2012-03-01 16:38:42 Re: performance results on IBM POWER7