BUG #3875: Building GIN index on full table could be slower than filling table with existing index.

From: "Sokolov Yura" <funny(dot)falcon(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3875: Building GIN index on full table could be slower than filling table with existing index.
Date: 2008-01-15 13:20:32
Message-ID: 200801151320.m0FDKW7g035228@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3875
Logged by: Sokolov Yura
Email address: funny(dot)falcon(at)gmail(dot)com
PostgreSQL version: 8.3RC1 8.2.x
Operating system: Debian 4.0
Description: Building GIN index on full table could be slower than
filling table with existing index.
Details:

Here is pathological case:

\timing
drop table if exists test_gin;

create table test_gin (
id serial primary key,
ar int4[]
);

create index test_gin_ix_ar on test_gin
using gin
( ar );
-- And here is pathological pattern
insert into test_gin (ar)
select ARRAY[i, i-1, i+1, i-2, i+2]
from generate_series(1, 20000) as i;

drop index test_gin_ix_ar;

create index test_gin_ix_ar on test_gin
using gin
( ar );

Timing for this case:

Timing is on.
DROP TABLE
Time: 19,421 ms
CREATE TABLE
Time: 13,531 ms
CREATE INDEX
Time: 3,352 ms
INSERT 0 20000 !
Time: 1064,444 ms !!
DROP INDEX
Time: 30,925 ms
CREATE INDEX !
Time: 28180,484 ms !!!!

So that, creating index on full table is 26.5x slower than filling table
with existing index.

After applying a patch below timings are:

.....
INSERT 0 20000
Time: 1036,746 ms
.....
CREATE INDEX
Time: 220,073 ms

For other tested pattern patched timing usually
slightly better (5-10%) and in rare case slightly
worse(5-10%). Other tested patterns are:

insert into test_gin (ar)
select ARRAY[i, i-1, i+1, i-2, i+2]
from generate_series(1, 100) as i,
generate_series(1, 200) as j;

insert into test_gin (ar)
select ARRAY[i, i-1, i+1, i-2, i+2]
from generate_series(1, 200) as j,
generate_series(1, 100) as i;

insert into test_gin (ar)
select ARRAY[i, i-1, i+1, i-2, i+2]
from generate_series(1, 200) as j,
generate_series(1, 1000) as i;

insert into test_gin (ar)
select (select int_array_aggregate(k) from generate_series(i, i+200) as
k)
from generate_series(1, 200) as j,
generate_series(1, 100) as i;

Patch:

diff -pr postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c
postgresql-8.3RC1/src/backend/access/gin/ginbulk.c
*** postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c 2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/ginbulk.c 2008-01-14
16:00:48.000000000 +0300
*************** ginInitBA(BuildAccumulator *accum)
*** 28,33 ****
--- 28,34 ----
accum->maxdepth = 1;
accum->stackpos = 0;
accum->entries = NULL;
+ accum->nentries = 0;
accum->stack = NULL;
accum->allocatedMemory = 0;
accum->entryallocator = NULL;
*************** EAAllocate(BuildAccumulator *accum)
*** 44,49 ****
--- 45,51 ----
}

accum->length++;
+ accum->nentries++;
return accum->entryallocator + accum->length - 1;
}

diff -pr postgresql-8.3RC1old/src/backend/access/gin/gininsert.c
postgresql-8.3RC1/src/backend/access/gin/gininsert.c
*** postgresql-8.3RC1old/src/backend/access/gin/gininsert.c 2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/gininsert.c 2008-01-14
16:07:21.000000000 +0300
*************** ginBuildCallback(Relation index, HeapTup
*** 238,244 ****
buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);

/* If we've maxed out our available memory, dump everything to the index
*/
! if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
{
ItemPointerData *list;
Datum entry;
--- 238,246 ----
buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);

/* If we've maxed out our available memory, dump everything to the index
*/
! if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L
! || (buildstate->accum.maxdepth > 16 &&
! buildstate->accum.nentries < (1 << (buildstate->accum.maxdepth /
2))))
{
ItemPointerData *list;
Datum entry;
diff -pr postgresql-8.3RC1old/src/include/access/gin.h
postgresql-8.3RC1/src/include/access/gin.h
*** postgresql-8.3RC1old/src/include/access/gin.h 2008-01-01
22:45:56.000000000 +0300
--- postgresql-8.3RC1/src/include/access/gin.h 2008-01-14 15:59:51.000000000
+0300
*************** typedef struct
*** 450,455 ****
--- 450,456 ----
{
GinState *ginstate;
EntryAccumulator *entries;
+ uint32 nentries;
uint32 maxdepth;
EntryAccumulator **stack;
uint32 stackpos;

Browse pgsql-bugs by date

  From Date Subject
Next Message Ruben Camargo Gomez 2008-01-15 16:52:13 BUG #3876: Problems migrating databases
Previous Message Sokolov Yura 2008-01-15 11:28:37 Re: BUG #3860: xpath crashes backend when is querying xmlagg result