Bulk loading performance improvements

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bulk loading performance improvements
Date: 2008-02-26 14:46:38
Message-ID: 1204037198.4252.305.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looking at the profile for COPY and then a profile for CREATE TABLE AS
SELECT (CTAS) there is clearly too much overhead caused by inserting the
rows one at a time.

Flat profile of CTAS: (2 cols of this output removed for clarity)

Each sample counts as 0.01 seconds.
% cumulative self
time seconds seconds calls
5.63 0.52 0.52 21302604 LWLockRelease
5.63 1.04 0.52 21193900 LWLockAcquire
4.77 1.48 0.44 10386370 hash_any
4.77 1.92 0.44 10139843 ReadBuffer_common
4.66 2.35 0.43 10000842 PinBuffer
4.55 2.77 0.42 10000001 ExecProject
4.44 3.18 0.41 10000014 heap_insert
4.23 3.57 0.39 10143770 UnpinBuffer
3.79 3.92 0.35 10000057 MarkBufferDirty
3.36 4.23 0.31 10000001 ExecMakeFunctionResult
3.25 4.53 0.30 10000014 RelationGetBufferForTuple
2.71 4.78 0.25 10000028 heap_fill_tuple
2.60 5.02 0.24 10528273 hash_search_with_hash_value
2.38 5.24 0.22 10000042 PageAddItem
2.38 5.46 0.22 10000000 heap_form_tuple
1.73 5.82 0.16 10143788 ResourceOwnerForgetBuffer
1.73 5.98 0.16 10000014 pgstat_count_heap_insert
1.63 6.13 0.15 20280380 LockBuffer
1.63 6.28 0.15 10139682 ReleaseBuffer
1.63 6.43 0.15 10000001 ExecProcNode
1.52 6.57 0.14 10004864 AllocSetAlloc
1.52 6.71 0.14 10003013 AllocSetFree
1.52 6.85 0.14 10000003 ExecEvalConst
1.52 6.99 0.14 10000000 intorel_receive
1.41 7.12 0.13 20139096 BufferGetBlockNumber
1.41 7.25 0.13 10000029 CacheInvalidateH

The above profile is probably fairly worst-case and differs between CTAS
and COPY. It also reduces as row length increases.

Thin tables are an important use case since many Associative tables
(Many-Many) are larger than the tables that they link together. Those
tables are often just two integer columns. So the above results are
relevant to many large database tables.

The system is beginning to be I/O bound during these operations, but CPU
savings are still helpful.

If we loaded data 1 block at a time, it seems like we make the following
calls 1 per block rather than 1 per row. That would reduce these calls
to perhaps 1% of their costs, depending upon row length.

5.63 0.52 0.52 21302604 LWLockRelease
5.63 1.04 0.52 21193900 LWLockAcquire
4.77 1.48 0.44 10386370 hash_any
4.77 1.92 0.44 10139843 ReadBuffer_common
4.66 2.35 0.43 10000842 PinBuffer
4.44 3.18 0.41 10000014 heap_insert
4.23 3.57 0.39 10143770 UnpinBuffer
3.79 3.92 0.35 10000057 MarkBufferDirty
3.25 4.53 0.30 10000014 RelationGetBufferForTuple
1.73 5.82 0.16 10143788 ResourceOwnerForgetBuffer
1.73 5.98 0.16 10000014 pgstat_count_heap_insert
1.63 6.13 0.15 20280380 LockBuffer
1.63 6.28 0.15 10139682 ReleaseBuffer
1.41 7.12 0.13 20139096 BufferGetBlockNumber
1.41 7.25 0.13 10000029 CacheInvalidateHeapTuple

which together account for more than 50% of CPU.

So that means we would keep the buffer pinned and locked while we do
everything else during loading. That is difficult to achieve if we have
indexes on the table and also of questionable value in concurrent
situations.

Discussing with Heikki we could just kept the buffer pinned, but lock
and unlock it for each insert then we would save on most of the above.
The following possible savings would still exist...

5.63 0.52 0.52 21302604 LWLockRelease
5.63 1.04 0.52 21193900 LWLockAcquire
3.79 3.92 0.35 10000057 MarkBufferDirty
1.73 5.98 0.16 10000014 pgstat_count_heap_insert
1.63 6.13 0.15 20280380 LockBuffer
1.63 6.28 0.15 10139682 ReleaseBuffer

which account for about 18% of CPU.

So the proposals for improving COPY are:

1) When running a COPY operation into a table with indexes and/or
triggers defined we will do this for each block
* identify the block to insert into
* pin it and keep it pinned
* perform all inserts until full
* unpin it

I'm posting a patch now to pgsql-patches that does this: "Bulk Insert
tuning". I measure gains of around 20% for CTAS and for 15-17% for COPY,
though the variability of results is fairly high.

2) When running CREATE TABLE AS SELECT or COPY into a table with zero
indexes and zero triggers (yet) we will *also* do these actions for each
block
* load data into a private buffer, then when full
* identify block to insert into
* lock buffer, memcpy() block into place, set tuple pointers, unlock
* mark buffer dirty once at end
* copy whole buffer to WAL

That will remove all of the block lock/unlock overhead, reduce WAL
locking and reduce overall WAL volume from large loads.

Objective: reduce CPU overhead for initial loads by around 25-40%, or
15-20% more benefit than achieved by (1) above.

3) Maintain a full block list within the buffer manager. Like the
opposite of a freelist. Any bulk operation that finishes writing a block
adds it to the fulllist. bgwriter cleans blocks on the fulllist first
before it starts normal clock sweep, up to a limit of half of the blocks
it is allowed to clean. This ensures that backends which are the source
of many dirty blocks are targeted more accurately by the bgwriter (but
not exclusively). Potentially improve chances of OS/disk controller
being able to combine writes and perform sequential write I/O.

Objective: improve hit rate of bgwriter during bulk insert ops

4) Allow bulk insert operations to utilise a BufferAccessStrategy so
that they reuse the same buffers in the bufferpool and avoid spoiling
the cache for everybody else. This on its own will slow down bulk
operations since they will be forced to write their own dirty blocks.
Together with (3) this should actually speed up bulk ops.

Objective: avoid cache spoiling and attempt to improve L2 cache usage

These would be done as separate patches, in that order.

I'm assuming that we'll be able to tune index maintenance as well,
though I'll leave that to Heikki and/or Itagaki.

Comments?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-26 15:03:57 Re: pg_dump additional options for performance
Previous Message Ron Mayer 2008-02-26 14:27:46 Re: pg_dump additional options for performance