Re: Improving the Performance of Full Table Updates

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Gokulakannan Somsundaram" <gokul007(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving the Performance of Full Table Updates
Date: 2007-09-21 10:24:43
Message-ID: 46F39BEB.1040205@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gokulakannan Somsundaram wrote:
> Again if this full table updates are thought with the OLTP applications in
> mind, then this is not at all a suitable option. This will only benefit the
> people with Data Warehouses.
>
> Expecting some more replies....

Start with profiling.

I just ran a quick oprofile run of a full-table UPDATE on a simple table
with one index, and it looks like RelationGetBufferForTuple uses 4.53%
of the CPU time. Out of that, 2.86 percentage points are spent in
ReadBuffer_common. That means that write-locking the heap pages takes at
most 4.53 - 2.86 = 1.67 % of the total CPU time.

That's the upper limit of the benefit from the scheme you're proposing.
Surely the effort would be better spent on something else. For example,
if you kept the insertion target page just pinned over the calls, which
wouldn't have the problems with triggers etc, you could save that 2.86%.
Which still isn't much. Or take a look at WAL logging. XLogInsert took
16.06% of the CPU time. Earlier tests have suggested that a big chunk of
that time is spent in CRC calculation. Alternative CRC methods have been
suggested in the past, or perhaps that could time could be offloaded to
the WAL writer process, speeding up the UPDATE on a multi-CPU server.

Also, if we're talking about data warehousing, we're talking about big
tables that don't fit in memory. That means that you're likely
bottlenecked by I/O speed, not CPU. If that's the case, saving some CPU
time makes no difference whatsoever. What would help with I/O bottleneck
is to try to make the disk footprint smaller, or make better use of the
I/O bandwidth available.

Three steps to improve throughput:

1. Identify the hardware component that's the bottleneck.
2. Profile the workload to see what's using the bottlenecked resource
the most.
3. Figure out how to make that piece of code cheaper.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2007-09-21 10:45:18 Re: ecpg PREPARE is not thread safe
Previous Message Gregory Stark 2007-09-21 10:00:31 Re: like/ilike improvements