Skip site navigation (1) Skip section navigation (2)

Re: Postgresql - performance of using array in big database

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: roberthanco(at)o2(dot)pl
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgresql - performance of using array in big database
Date: 2012-08-08 06:33:02
Message-ID: 5022081E.5080900@ringerc.id.au (view raw or flat)
Thread:
Lists: pgsql-performance
On 08/03/2012 05:14 PM, roberthanco(at)o2(dot)pl wrote:

 > It is read-only table so every integer column have an index.

First tip: Define the table without the indexes. INSERT your data, and 
only after it is inserted create your indexes.

Similarly, if you're making huge changes to the table you should 
consider dropping the indexes, making the changes, and re-creating the 
indexes. You might not have to drop the indexes if you aren't changing 
indexed fields, since HOT might save you, but it depends a lot on the 
specifics of the table's on-disk layout etc.

> The server is: 12 GB RAM, 1,5 TB SATA, 4 CORES. All server for postgres.
> There are many more tables in this database so RAM do not cover all 
> database.

OK, in that case more info on the disk subsystem is generally helpful. 
Disk spin speed, type? RAID configuration if any? eg:

   4 x 750GB 7200RPM Western Digital  Black SATA 3 HDDs in RAID 10 using 
the Linux 'md' raid driver

or

   2 x 1.5TB 7200RPM "Enterprise/near-line" SATA3 HDDs in RAID 1 using a 
Dell PARC xxxx controller with BBU in write-back cache mode.

... though if you're only bulk-inserting the BBU doesn't matter much.
> |
> | I wonder what option would be better in performance point of view.

I would advise you to test on a subset of your data. Try loading the 
same 50,000 records into different databases, one with each structure. 
Measure how long the load takes for each design, and how long the 
queries you need to run take to execute. Repeat the process with 500,000 
records and see if one design slows down more than the other design 
does. Etc.

> I need to make a good decision because import of this data will take 
> me a 20 days.

For the sheer size of data you have you might want to think about using 
pg_bulkload. If you can't or don't want to do that, then at least use 
COPY to load big batches of your data.

--
Craig Ringer

pgsql-performance by date

Next:From: Craig RingerDate: 2012-08-08 06:50:26
Subject: Re: Postgres Upgrade from 8.4 to 9.1
Previous:From: Jeff JanesDate: 2012-08-08 00:50:32
Subject: Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group