Re: very very slow inserts into very large table

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: very very slow inserts into very large table
Date: 2012-07-17 16:24:37
Message-ID: CAMkU=1xrk=DpLJOXsNMe4QGgD2OPUTKxNNaTK-eGgkriuLVd2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jul 17, 2012 at 8:59 AM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> On Tue, Jul 17, 2012 at 6:30 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>> On 07/17/2012 01:56 AM, Jon Nelson wrote:
>> To perform reasonably well, Pg would need to be able to defer index updates
>> when bulk-loading data in a single statement (or even transaction), then
>> apply them when the statement finished or transaction committed. Doing this
>> at a transaction level would mean you'd need a way to mark indexes as
>> 'lazily updated' and have Pg avoid using them once they'd been dirtied
>> within a transaction. No such support currently exists, and it'd be
>> non-trivial to implement, especially since people loading huge amounts of
>> data often want to do it with multiple concurrent sessions. You'd need some
>> kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional
>> backing table of pending index updates.
>
> It seems to me that if the insertion is done as a single statement it
> wouldn't be a problem to collect up all btree insertions and apply
> them before completing the statement. I'm not sure how much that would
> help though. If the new rows have uniform distribution you end up
> reading in the whole index anyway. Because indexes are not stored in
> logical order you don't get to benefit from sequential I/O.

In this case, he is loading new data that is 5% of the current data
size. A leaf page probably has much more than 20 entries, so by
sorting them you could turn many scattered accesses to the same page
to one access (or many accesses that immediately follow each other,
and so are satisfied by the cache).

Also, while indexes are not formally kept in logical order, but they
do tend to be biased in that direction in most cases. I've found that
even if you are only inserting one row for every 4 or 5 leaf pages,
you still get substantial improvement by doing so in sorted order.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Blackwell 2012-07-17 16:27:23 Slow application response on lightly loaded server?
Previous Message Ants Aasma 2012-07-17 15:59:43 Re: very very slow inserts into very large table