Skip site navigation (1) Skip section navigation (2)

Re: Key/Value reference table generation: INSERT/UPDATE performance

From: "Peter Childs" <peterachilds(at)gmail(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Key/Value reference table generation: INSERT/UPDATE performance
Date: 2007-05-22 09:05:27
Message-ID: a2de01dd0705220205l261ac149n6f168fbb6141480c@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 22 May 2007 01:23:03 -0700, valgog <valgog(at)gmail(dot)com> wrote:
>
> I found several post about INSERT/UPDATE performance in this group,
> but actually it was not really what I am searching an answer for...
>
> I have a simple reference table WORD_COUNTS that contains the count of
> words that appear in a word array storage in another table.
>
> CREATE TABLE WORD_COUNTS
> (
>   word text NOT NULL,
>   count integer,
>   CONSTRAINT PK_WORD_COUNTS PRIMARY KEY (word)
> )
> WITHOUT OIDS;



Is there any reason why count is not not null? (That should siplify your
code by removing the coalesce)

insert is more efficient than update because update is always a delete
followed by an insert.

Oh and group by is nearly always quicker than distinct and can always? be
rewritten as such. I'm not 100% sure why its different but it is.

Peter.



I have some PL/pgSQL code in a stored procedure like
>
>   FOR r
>    IN select id, array_of_words
>         from word_storage
>   LOOP
>     begin
>       -- insert the missing words
>       insert into WORD_COUNTS
>                   ( word, count )
>                   ( select word, 0
>                       from ( select distinct (r.array_of_words)
> [s.index] as d_word
>                                from generate_series(1,
> array_upper( r.array_of_words, 1 ) ) as s(index) ) as distinct_words
>                      where word not in ( select d_word from
> WORD_COUNTS  ) );
>       -- update the counts
>       update WORD_COUNTS
>          set count = COALESCE( count, 0 ) + 1
>        where word in ( select distinct (r.array_of_words)[s.index] as
> word
>                             from generate_series(1,
> array_upper( r.array_of_words, 1) ) as s(index) );
>     exception when others then
>       error_count := error_count + 1;
>     end;
>     record_count := record_count + 1;
>   END LOOP;
>
> This code runs extremely slowly. It takes about 10 minutes to process
> 10000 records and the word storage has more then 2 million records to
> be processed.
>
> Does anybody have a know-how about populating of such a reference
> tables and what can be optimized in this situation.
>
> Maybe the generate_series() procedure to unnest the array is the place
> where I loose the performance?
>
> Are the set update/inserts more effitient, then single inserts/updates
> run in smaller loops?
>
> Thanks for your help,
>
> Valentine Gogichashvili
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>        message can get through to the mailing list cleanly
>

In response to

pgsql-performance by date

Next:From: Richard HuxtonDate: 2007-05-22 09:21:43
Subject: Re: Key/Value reference table generation: INSERT/UPDATE performance
Previous:From: valgogDate: 2007-05-22 08:23:03
Subject: Key/Value reference table generation: INSERT/UPDATE performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group