Re: Postgresql - performance of using array in big database

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: roberthanco(at)o2(dot)pl
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgresql - performance of using array in big database
Date: 2012-08-08 06:33:02
Message-ID: 5022081E.5080900@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 08/03/2012 05:14 PM, roberthanco(at)o2(dot)pl wrote:

> It is read-only table so every integer column have an index.

First tip: Define the table without the indexes. INSERT your data, and
only after it is inserted create your indexes.

Similarly, if you're making huge changes to the table you should
consider dropping the indexes, making the changes, and re-creating the
indexes. You might not have to drop the indexes if you aren't changing
indexed fields, since HOT might save you, but it depends a lot on the
specifics of the table's on-disk layout etc.

> The server is: 12 GB RAM, 1,5 TB SATA, 4 CORES. All server for postgres.
> There are many more tables in this database so RAM do not cover all
> database.

OK, in that case more info on the disk subsystem is generally helpful.
Disk spin speed, type? RAID configuration if any? eg:

4 x 750GB 7200RPM Western Digital Black SATA 3 HDDs in RAID 10 using
the Linux 'md' raid driver

or

2 x 1.5TB 7200RPM "Enterprise/near-line" SATA3 HDDs in RAID 1 using a
Dell PARC xxxx controller with BBU in write-back cache mode.

... though if you're only bulk-inserting the BBU doesn't matter much.
> |
> | I wonder what option would be better in performance point of view.

I would advise you to test on a subset of your data. Try loading the
same 50,000 records into different databases, one with each structure.
Measure how long the load takes for each design, and how long the
queries you need to run take to execute. Repeat the process with 500,000
records and see if one design slows down more than the other design
does. Etc.

> I need to make a good decision because import of this data will take
> me a 20 days.

For the sheer size of data you have you might want to think about using
pg_bulkload. If you can't or don't want to do that, then at least use
COPY to load big batches of your data.

--
Craig Ringer

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2012-08-08 06:50:26 Re: Postgres Upgrade from 8.4 to 9.1
Previous Message Jeff Janes 2012-08-08 00:50:32 Re: Slow query: Select all buildings that have >1 pharmacies and >1 schools within 1000m