Re: COPY with hints, rebirth

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-26 19:16:46
Message-ID: 4F4A851E.3080501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.02.2012 22:55, Simon Riggs wrote:
> A long time ago, in a galaxy far away, we discussed ways to speed up
> data loads/COPY.
> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>
> In particular, the idea that we could mark tuples as committed while
> we are still loading them, to avoid negative behaviour for the first
> reader.
>
> Simple patch to implement this is attached, together with test case.
>
> ...
>
> What exactly does it do? Previously, we optimised COPY when it was
> loading data into a newly created table or a freshly truncated table.
> This patch extends that and actually sets the tuple header flag as
> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
> code. The patch also adds some tests for corner cases that would make
> that action break MVCC - though those cases are minor and typical data
> loads will benefit fully from this.

This doesn't work with subtransactions:

postgres=# create table a as select 1 as id;
SELECT 1
postgres=# copy a to '/tmp/a';
COPY 1
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
id
----
(0 rows)

The query should return the row copied in the same subtransaction.

> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
> and adding current xid to snapshots. That is an invasive change that I
> would wish to avoid at any time and explains the long delay in
> tackling this. The way I've implemented it, is just as a short test
> during XidInMVCCSnapshot() so that we trap the case when the xid ==
> xmax and so would appear to be running. This is much less invasive and
> just as performant as Tom's original suggestion.

TransactionIdIsCurrentTransactionId() can be fairly expensive if you
have a lot of subtransactions open...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-02-26 19:24:08 Checkpointer vs pg_stat_bgwriter
Previous Message Kevin Grittner 2012-02-26 16:06:54 Re: How to know a table has been modified?