COPY with hints, rebirth

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: COPY with hints, rebirth
Date: 2012-02-24 20:55:33
Message-ID: CA+U5nMJ8CdApEcm0+Ln0WXVx194k3wJHihb=-0xDHen1v0hhDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A long time ago, in a galaxy far away, we discussed ways to speed up
data loads/COPY.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

In particular, the idea that we could mark tuples as committed while
we are still loading them, to avoid negative behaviour for the first
reader.

Simple patch to implement this is attached, together with test case.

Current behaviour is shown here
Run COPY and then... SELECT count(*) FROM table with no indexes
1st SELECT Time: 1518.571 ms <--- slowed dramatically by setting hint bits
2nd SELECT Time: 914.141 ms
3rd SELECT Time: 914.921 ms

With this patch I observed the following results
1st SELECT Time: 890.820 ms
2nd SELECT Time: 884.799 ms
3rd SELECT Time: 882.405 ms

What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.

In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
and adding current xid to snapshots. That is an invasive change that I
would wish to avoid at any time and explains the long delay in
tackling this. The way I've implemented it, is just as a short test
during XidInMVCCSnapshot() so that we trap the case when the xid ==
xmax and so would appear to be running. This is much less invasive and
just as performant as Tom's original suggestion.

Why do we need this now? Setting checksums on page requires us to
write WAL for hints, so the situation of the 1st SELECT after a load
would get somewhat worse when page_checksums are enabled, but we
already know there is a price. However, this is a situation we can
solve, and add value for all cases, not just when checksums enabled.
So I'm posting this as a separate patch rather than including that as
a tuning feature of the checksums patch.

Your input will be generously received,

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_nohints.v357.patch text/x-diff 4.8 KB
copytest.sql text/x-sql 358 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-02-24 21:20:12 Cleanup of make_ctags
Previous Message amit sehas 2012-02-24 20:45:41 Behavior of subselects in target lists and order by