Re: still gin index creation takes forever

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: still gin index creation takes forever
Date: 2008-11-13 09:33:32
Message-ID: 20081113103332.361f315e@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 12 Nov 2008 15:18:05 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > So, in that case process can insert about 1000 ItemPointers per
> > one data tree lookup, in opposite case it does 1000 lookups in
> > data tree.

> I see. So this could explain Ivan's issue if his table contains
> large numbers of repeated GIN keys. Ivan, is that what your data
> looks like?

Well if by GIN keys you mean lexemes it could be. But I wouldn't say
this circumstance is uncommon among users of tsearch. I'd expect
other people had used tsearch2 to search through titles, authors and
publishers of books, so if that was the problem I'd expect the
problem to come up earlier.
Actually tsearch2 is not completely tuned up, since I still have to
"mix" Italian and English configuration to get rid of some more
stop words etc... that may increase the number of repetitions, but I
doubt this only put me in a corner case.

Anyway trying to answer in a more objective way to your question I
ran:

SELECT * FROM ts_stat('SELECT ft1idx FROM catalog_items') order by
nentry desc, ndoc desc limit 20;

It ran over 9h and I still wasn't able to get the answer.
I killed psql client that was running it and postgres continued to
eat 100% CPU for a while till I stopped it.

Considering that running:
SELECT * FROM ts_stat('SELECT ft1idx FROM catalog_items limit 50000')
order by nentry desc, ndoc desc limit 20;
returned in less than 2 minutes and catalog_items has a bit less
than 1M record... there is still something weird.

"springer";10824;10833
"e";7703;8754
"di";6815;7771
"il";5622;6168
"la";4989;5407
"hall";4357;4416
"prentic";4321;4369
"l";3920;4166
"del";3092;3281
"edizioni";2465;2465
"della";2292;2410
"m";2283;2398
"dell";2150;2281
"j";1967;2099
"d";1789;1864
"per";1685;1770
"longman";1671;1746
"le";1656;1736
"press";1687;1687
"de";1472;1564

examining 90K records took a bit more than 6min.

I'll try to move everything on another box and see what happens.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2008-11-13 09:41:35 Re: Upgrading side by side in Gentoo
Previous Message Markus Wollny 2008-11-13 09:32:42 Re: Suboptimal execution plan for simple query