Re: Please comment on pgsql speed at handling 550,000

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Please comment on pgsql speed at handling 550,000
Date: 2006-02-07 19:10:59
Message-ID: C00E5AF3.5562%sdavis2@mail.nih.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I would do something like this:

Create table seq_go (
seq_name varchar,
go_id varchar
);
Create index seq_go_seq_name_idx on seq_go(seq_name);
Create indes seq_go_go_id on seq_go(go_id);

Do a copy from the text file that contains the sequence to GO mapping into
the above table. This should be VERY fast.

Then create table c as before.

Finally, do:

insert into table c (seq_id,gos_id)
select a.seq_id,b.gos_id from a,b,seq_go
where seq_go.seq_name=a.seq_name and
b.go_id=seq_go.go_id;

There may be some typos (haven't tested this), but the insert is quite fast.
I have built these tables using the same data that you are looking at
(probably) in a few seconds--postgres is not limiting in that way.

Sean

On 2/7/06 1:36 PM, "Srinivas Iyyer" <srini_iyyer_bio(at)yahoo(dot)com> wrote:

> Sorry Tom, for the confusion.
>
> seq_id is not equal to go_id
>
> seq_id = NM_000033
> go_is = GO:049934
>
> It is bad on my part I wrote
> seq_id = 'xxxx'
> go_id = 'xxxx' it should be 'yyyy'
>
> srini
>
> --- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com> writes:
>>> [ It's not fast to do 500K commands like ]
>>> insert into tablec (seq_id,gos_id) values (
>>> (select seq_id from table a where seq_name
>> ='xxxx'),
>>> (select gos_id from table b where go_id =
>> 'xxxx'));
>>
>> Don't you want something like
>>
>> insert into tablec (seq_id,gos_id)
>> select seq_id, gos_id from a, b where seq_name =
>> go_id;
>>
>> SQL is not a low-level language, and breaking a
>> table-wise operation
>> down into bite-size parts is not the way to make it
>> go fast. The
>> startup overhead for a command is almost always
>> going to dwarf the time
>> spent processing any one row, so you want to make
>> sure you process as
>> many rows per command as feasible.
>>
>> Also, make sure you've ANALYZEd both input tables
>> beforehand,
>> else the planner may choose a poor plan for this
>> command.
>> It'd be worth looking at the EXPLAIN output for the
>> command
>> just to make sure nothing silly is happening.
>>
>> regards, tom lane
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Noel Faux 2006-02-08 04:30:19 data corruption how zero bad page blocks etc
Previous Message Tom Lane 2006-02-07 18:46:17 Re: Please comment on pgsql speed at handling 550,000 records