Skip site navigation (1) Skip section navigation (2)

Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Burladyan <eshkinkot(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
Date: 2009-03-24 11:07:34
Message-ID: 49C8BEF6.3080909@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:
> Sergey Burladyan <eshkinkot(at)gmail(dot)com> writes:
>> show maintenance_work_mem ;
>>  maintenance_work_mem
>> ----------------------
>>  128MB
> 
>> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
>> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
>> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
> 
> [ takes forever ]
> 
> Seems the reason this is so awful is that the incoming data is exactly
> presorted, meaning that the tree structure that ginInsertEntry() is
> trying to build degenerates to a linear list (ie, every new element
> becomes the right child of the prior one).  So the processing becomes
> O(N^2) up till you reach maintenance_work_mem and flush the tree.  With
> a large setting for maintenance_work_mem it gets spectacularly slow.

Yes, this is probably the same issue I bumped into a while ago:

http://archives.postgresql.org/message-id/49350A13.3020105@enterprisedb.com

> I think a reasonable solution for this might be to keep an eye on
> maxdepth and force a flush if that gets too large (more than a few
> hundred, perhaps?).  Something like this:
> 
>     /* If we've maxed out our available memory, dump everything to the index */
> +   /* Also dump if the tree seems to be getting too unbalanced */
> -   if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
> +   if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
> +       buildstate->accum.maxdepth > DEPTH_LIMIT)
>     {
> 
> The new fast-insert code likely will need a similar defense.

I fooled around with a balanced tree, which solved the problem but 
unfortunately made the unsorted case slower. Limiting the depth like 
that should avoid that so it's worth trying.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2009-03-24 13:45:13
Subject: Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
Previous:From: Kouber SaparevDate: 2009-03-24 10:52:24
Subject: Re: LIMIT confuses the planner

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group