Bulk Inserts

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Bulk Inserts
Date: 2009-09-14 14:26:53
Message-ID: op.uz83q3tecke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I've done a little experiment with bulk inserts.

=> heap_bulk_insert()

Behaves like heap_insert except it takes an array of tuples (HeapTuple
*tups, int ntups).

- Grabs a page (same as heap_insert)

- While holding exclusive lock, inserts as many tuples as it can on the
page.
- Either the page gets full
- Or we run out of tuples.

- Generate xlog : choice between
- Full Xlog mode :
- if we inserted more than 10 tuples (totaly bogus heuristic), log the
entire page
- Else, log individual tuples as heap_insert does
- Light log mode :
- if page was empty, only xlog a "new empty page" record, not page
contents
- else, log fully
- heap_sync() at the end

- Release the page
- If we still have tuples to insert, repeat.

Am I right in assuming that :

1)
- If the page was empty,
- and log archiving isn't used,
- and the table is heap_sync()'d at the end,
=> only a "new empty page" record needs to be created, then the page can
be completely filled ?

2)
- If the page isn't empty
- or log archiving is used,
=> logging either the inserted tuples or the entire page is OK to
guarantee persistence ?

(I used kill -9 to test it, recovery seems to work).

Test on a concurrent COPY, 4 threads, on a table with 8 INT columns.

* 8.5 HEAD : Total Time 44 s
* Bulk inserts, Full XLog : Total Time 24 s
* Bulk inserts, Light XLog : Total Time 10 s

Quite a bit faster... I presume with more CPUs it would scale.

I'm not posting the patch because it's quite ugly (especially the part to
store tuples in copy.c and bulk-insert them, I should probably have used a
tuplestore...)
I think the tuples need to be stored and then bulk-inserted because the
exclusive lock on the buffer can't be held for a long time.

Lock stats (from the patch I just posted) :

* 8.5 HEAD : Total Time 44 s

-------- Lock stats for PID 28043
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
28043 7 0 0 0.00 0.00
2500002 804378 23.59 ( 53.11 %) 7.38 ( 16.61 %) WALInsert
28043 8 0 0 0.00 0.00
25775 32 2.91 ( 6.54 %) 0.90 ( 2.02 %) WALWrite

-------- Lock stats for PID 28044
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
28044 7 0 0 0.00 0.00
2500002 802515 22.26 ( 50.11 %) 8.70 ( 19.59 %) WALInsert
28044 8 0 0 0.00 0.00
25620 42 4.00 ( 9.01 %) 1.12 ( 2.52 %) WALWrite

-------- Lock stats for PID 28045
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
28045 7 0 0 0.00 0.00
2500002 799145 22.47 ( 50.32 %) 8.72 ( 19.52 %) WALInsert
28045 8 0 0 0.00 0.00
25725 38 4.08 ( 9.14 %) 1.05 ( 2.35 %) WALWrite

-------- Lock stats for PID 28042
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
28042 7 0 0 0.00 0.00
2500002 809477 23.49 ( 52.44 %) 7.89 ( 17.62 %) WALInsert
28042 8 0 0 0.00 0.00
25601 37 3.27 ( 7.31 %) 1.05 ( 2.34 %) WALWrite

* Bulk inserts, Full XLog : Total Time 24 s

-------- Lock stats for PID 32486
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
32486 7 0 0 0.00 0.00
23765 1128 9.22 ( 38.98 %) 4.05 ( 17.14 %) WALInsert
32486 8 0 0 0.00 0.00
21120 19 2.64 ( 11.17 %) 1.32 ( 5.59 %) WALWrite

-------- Lock stats for PID 32484
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
32484 7 0 0 0.00 0.00
23865 1083 9.87 ( 41.68 %) 2.87 ( 12.11 %) WALInsert
32484 8 0 0 0.00 0.00
21105 11 1.68 ( 7.11 %) 1.09 ( 4.62 %) WALWrite
32484 8508 0 0 0.00 0.00
1 1 0.19 ( 0.81 %) 0.00 ( 0.00 %)
32484 18846 0 0 0.00 0.00
1 1 0.25 ( 1.05 %) 0.00 ( 0.00 %)

-------- Lock stats for PID 32485
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
32485 7 0 0 0.00 0.00
23816 1107 8.94 ( 37.75 %) 4.05 ( 17.09 %) WALInsert
32485 8 0 0 0.00 0.00
21109 21 2.59 ( 10.93 %) 1.36 ( 5.77 %) WALWrite
32485 16618 0 0 0.00 0.00
1 2 0.23 ( 0.98 %) 0.00 ( 0.00 %)

-------- Lock stats for PID 32482
PID Lock ShAcq ShWait ShWaitT ShHeldT
ExAcq ExWait ExWaitT ExHeldT Name
32482 7 0 0 0.00 0.00
23813 1053 9.70 ( 40.75 %) 3.41 ( 14.32 %) WALInsert
32482 8 0 0 0.00 0.00
21119 15 2.24 ( 9.43 %) 1.06 ( 4.44 %) WALWrite
32482 6770 0 0 0.00 0.00
3 1 0.17 ( 0.70 %) 0.00 ( 0.00 %)

* Bulk inserts, Light XLog : Total Time 10 s

No Lock stats to show (wait tims is < 0.01 s)...

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-09-14 14:30:18 Re: Disable and enable of table and column constraints
Previous Message Tom Lane 2009-09-14 14:22:34 Re: BUG #5053: domain constraints still leak