Re: A little COPY speedup

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: A little COPY speedup
Date: 2007-03-01 20:13:54
Message-ID: 45E73402.2060101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> On every row, PageAddItem will scan all the line pointers on the target
>> page, just to see that they're all in use, and create a new line
>> pointer. That adds up, especially with narrow tuples like what I used in
>> the test.
>> Attached is a fix for that.
>
> This has been proposed before, and rejected before. IIRC the previous
> patch was quite a lot less invasive than this one (it didn't require
> making special space on heap pages). I don't recall why it wasn't
> accepted.

Ahh, found that thread:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00609.php

The main differences between that patch and mine is that
- the previous patch used an offset to the first free line pointer, and
I used just a flag.
- the previous patch stored the offset in the page header, and I used
the special space

I think using the special space is a cleaner approach; the field is only
meaningful in heap pages. However, now that I think of it, if we could
squeeze the flag into one of the existing fields in the page header, we
could put it there without decreasing the amount of space available for
tuples. We could use the unused pd_tli field, as you suggested later in
that thread.

At the end of the thread, Bruce added the patch to his hold-queue, but I
couldn't find a trace of it after that so I'm not clear why it was
rejected in the end. This comment (by you) seems most relevant:

> I tried making a million-row table with just two int4 columns and then
> duplicating it with CREATE TABLE AS SELECT. In this context gprof
> shows PageAddItem as taking 7% of the runtime, which your patch knocks
> down to 1.5%. This seems to be about the best possible real-world case,
> though (the wider the rows, the fewer times PageAddItem can loop), and
> so I'm still unconvinced that there's a generic gain here. Adding an
> additional word to page headers has a very definite cost --- we can
> assume about a .05% increase in net I/O demands across *every*
> application, whether they do a lot of inserts or not --- and so a
> patch that provides a noticeable improvement in only a very small set
> of circumstances is going to have to be rejected.

I believe the PageAddItem overhead has become more noticeable since then
because of other improvements to COPY. In 8.3, we're also going to
reduce the tuple length (combocids and the varvarlen thing), so we can
fit more tuples per page, again making it slightly more significant.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-03-01 20:30:20 Re: A little COPY speedup
Previous Message Tom Lane 2007-03-01 19:17:20 Re: [HACKERS] Deadlock with pg_dump?