Re: Key/Value reference table generation: INSERT/UPDATE performance

From: cedric <cedric(dot)villemain(at)dalibo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Richard Huxton <dev(at)archonet(dot)com>, valgog <valgog(at)gmail(dot)com>
Subject: Re: Key/Value reference table generation: INSERT/UPDATE performance
Date: 2007-05-22 12:28:05
Message-ID: 200705221428.09546.cedric.villemain@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Le mardi 22 mai 2007, Richard Huxton a écrit :
> valgog wrote:
> > I found several post about INSERT/UPDATE performance in this group,
> > but actually it was not really what I am searching an answer for...
> >
> > I have a simple reference table WORD_COUNTS that contains the count of
> > words that appear in a word array storage in another table.
>
> I think this is the root of your problem, I'm afraid. You're trying to
> count individual words when you're storing an array of words. I don't
> think any of the Gist/GIN indexes will help you with this either.
>
> However, since "you don't want to start from here" isn't very useful
> here and now:
>
> 1. See what the performance (explain analyse) of the "select
> distinct...generate_series()" statement is. I think you're right and
> it's going to be slow.
> 2. You're looping through each row of word_storage and counting
> separately. Write it as one query if possible.
> 3. As Peter says, don't insert then update, start with an empty table
> and just insert totals for the lot (see #2).
>
> I'd probably write the query in plperl/python or something else that
> supports hash/dictionary structures. Then just process the whole
> word_storage into the hash - assuming you only have a few thousand
> distinct words that shouldn't take up too much memory.
+1
I made something very similar, and using PL/pgsql is very slow, when using
perl is very quick.

I have also use partioning because of cost of update (copy last partition to
the new, adding the new count, so there is only insert, and drop old table if
you want)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexander Staubo 2007-05-22 12:39:33 Re: Performace comparison of indexes over timestamp fields
Previous Message Arnau 2007-05-22 10:39:02 Performace comparison of indexes over timestamp fields