Re: Performance die when COPYing to table with bigint PK

From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: Robert Ayrapetyan <robert(dot)ayrapetyan(at)comodo(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance die when COPYing to table with bigint PK
Date: 2011-08-05 10:36:38
Message-ID: 4E3BC7B6.8050700@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

05.08.11 11:44, Robert Ayrapetyan написав(ла):
> Yes, you are right. Performance become even more awful.
> Can some techniques from pg_bulkload be implemented in postgres core?
> Current performance is not suitable for any enterprise-wide production system.
BTW: I was thinking this morning about indexes.
How about next feature:
Implement new index type, that will have two "zones" - old & new. New
zone is of fixed configurable size, say 100 pages (800 K).
Any search goes into both zones. So, as soon as index is larger then
800K, the search must be done twice.
As soon as new zone hit's it's size limit, part (may be only one?) of
it's pages are merged with old zone. The merge is "rolling" - if last
merge've stopped at "X" entry, next merge will start at entry right after X.

As for me, this should greatly resolve large index insert problem:
1) Insert into new zone must be quick because it's small and hot in cache.
2) During merge writes will be grouped because items with near keys (for
B-tree) or hashes (for hash index) will go to small subset of "old" zone
pages. In future, merge can be also done by autovacuum in background.
Yes, we get dual index search, but new zone will be hot, so this won't
make it twice as costly.

Best regards, Vitalii Tymchyshyn

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-08-05 14:00:58 Re: Postgres 8.4 memory related parameters
Previous Message Jeff Janes 2011-08-05 09:58:46 Re: UPDATEDs slowing SELECTs in a fully cached database