Skip site navigation (1) Skip section navigation (2)

Re: Time consuming process ...3 million records please help

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: Srinivas Iyyer <srini_iyyer_bio(at)yahoo(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Time consuming process ...3 million records please help
Date: 2006-03-28 02:31:07
Message-ID: 44289FEB.207@mail.nih.gov (view raw or flat)
Thread:
Lists: pgsql-novice
Srinivas Iyyer wrote:
> Hi Sean, 
> thanks for your reply. in my temp table,
> temp_refseq_id column cannot be indexed due to
> duplicate values. will that effect drastically. 
> I havent tested it yet. on top of that i killed 8 hrs
> process running previously on tab_update() function.
> :-)

Sri,

Columns with duplicate values can be indexed; only if you declare an 
index to be "unique" will you then have a problem with inserting 
duplicates.

You just need to do:

1.  Create your temp table structure.
2.  Copy in your data.
3.  Create an index for all fields that show up in a "where" or join 
clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!)

   create index temp_refseq_id_index on TABLENAME(temp_refseq_id);

4.  Vacuum analyze the table(s) after copying and creating the index 
(thereby telling postgres what is in the table)

   vacuum analyze TABLENAME

5.  Do something along the lines of:

   insert into ....
   select from ....

That should do it, approximately.  If you still have problems, then you 
will need to be more specific about what the problems are, I think.

As an aside, I have similar data stored in a postgres database, but in 
some places on the order of several 10's of millions of records; 
postgres is not limiting in this respect.

Sean


> thanks
> sri
> 
> 
>> Hi, Sri.
>>
>> I would do this differently.  You don't need the
>> function I don't 
>> think.  I would do something like this (untested):
>>
>> insert into spontanno (seq_id, spontanno_imageid,
>> spontanno_genbankacc)
>> select seqdump.seq_id, referrer_keys.temp_imageid,
>>     referrer_keys.temp.genbankacc from
>> referrer_keys, seqdump
>>     where
>> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>>
>> Make sure there is an index on seq_acc and
>> temp_refseq_id.  This should 
>> be done in seconds to minutes, not hours.
>>
>> Sean
> 
> -
> 
> 
> 
> <snip>
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 

In response to

Responses

pgsql-novice by date

Next:From: Adam WitneyDate: 2006-03-28 09:44:38
Subject: Re: Transfer from MySQL to PostgreSQL
Previous:From: Srinivas IyyerDate: 2006-03-28 02:17:16
Subject: Re: Time consuming process ...3 million records please help

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group