Skip site navigation (1) Skip section navigation (2)

Performance while loading data and indexing

From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Performance while loading data and indexing
Date: 2002-09-26 08:35:44
Message-ID: 3D931438.22010.133ADAFA@localhost (view raw or flat)
Thread:
Lists: pgsql-generalpgsql-hackerspgsql-performance
Hello all,

Some time back I posted a query to build a site with 150GB of database. In last 
couple of weeks, lots of things were tested at my place and there are some 
results and again some concerns. 

This is a long post. Please be patient and read thr. If we win this, I guess we 
have a good marketing/advocacy  case here..;-)

First the problems (For those who do not read beyond first page)

1) Database load time from flat file using copy is very high
2) Creating index takes huge amount of time.
3) Any suggsestions for runtime as data load and query will be going in 
parallel.

Now the details. Note that this is a test run only..

Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
RedHat7.2/PostgreSQL7.1.3

Database in flat file: 
125,000,000 records of around 100 bytes each. 
Flat file size 12GB

Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
Create unique composite index on 2 char and a timestamp field:  25226 sec.
Database size on disk: 26GB
Select query: 1.5 sec. for approx. 150 rows.

Important postgresql.conf settings

sort_mem = 12000
shared_buffers = 24000
fsync=true (Sad but true. Left untouched.. Will that make a difference on 
SCSI?)
wal_buffers = 65536 
wal_files = 64 

Now the requirements

Initial flat data load: 250GB of data. This has gone up since last query. It 
was 150GB earlier..
Ongoing inserts: 5000/sec. 
Number of queries: 4800 queries/hour
Query response time: 10 sec.


Now questions.

1)  Instead of copying from a single 12GB data file,  will a parallel copy from 
say 5 files will speed up the things? 

Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5 
setup..

2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further 
addition to improve create index performance?

3) 5K concurrent inserts with an index on, will this need a additional CPU 
power? Like deploying it on dual RISC CPUs etc? 

4) Query performance is not a problem. Though 4.8K queries per sec. expected 
response time from each query is 10 sec. But my guess is some serius CPU power 
will be chewed there too..

5)Will upgrading to 7.2.2/7.3 beta help?

All in all, in the  test, we didn't see the performance where hardware is 
saturated to it's limits. So effectively we are not able to get postgresql 
making use of it. Just pushing WAL and shared buffers does not seem to be the 
solution.

If you guys have any suggestions. let me know.  I need them all..

Mysql is almost out because it's creating index for last 17 hours. I don't 
think it will keep up with 5K inserts per sec. with index. SAP DB is under 
evaluation too. But postgresql is most favourite as of now because it works. So 
I need to come up with solutions to problems that will occur in near future..
;-)

TIA..

Bye
 Shridhar

--
Law of Procrastination:	Procrastination avoids boredom; one never has	the 
feeling that there is nothing important to do.


Responses

pgsql-performance by date

Next:From: Shridhar DaithankarDate: 2002-09-26 08:54:02
Subject: Re: [HACKERS] Performance while loading data and indexing
Previous:From: kopraDate: 2002-09-14 12:28:03
Subject: subscribe

pgsql-hackers by date

Next:From: Hannu KrosingDate: 2002-09-26 08:52:48
Subject: Re: unicode
Previous:From: Manfred KoizarDate: 2002-09-26 08:14:23
Subject: Re: Bug in PL/pgSQL GET DIAGNOSTICS?

pgsql-general by date

Next:From: Shridhar DaithankarDate: 2002-09-26 08:54:02
Subject: Re: [HACKERS] Performance while loading data and indexing
Previous:From: Gaetano MendolaDate: 2002-09-26 08:07:15
Subject: Re: rotatelog / logrotate with PostgreSQL

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group