From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in> |
Cc: | pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Performance while loading data and indexing |
Date: | 2002-09-26 09:05:19 |
Message-ID: | 20020926090519.GB10471@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers pgsql-performance |
I'll preface this by saying that while I have a large database, it doesn't
require quite the performace you're talking about here.
On Thu, Sep 26, 2002 at 02:05:44PM +0530, Shridhar Daithankar wrote:
> 1) Database load time from flat file using copy is very high
> 2) Creating index takes huge amount of time.
> 3) Any suggsestions for runtime as data load and query will be going in
> parallel.
You're loading all the data in one copy. I find that INSERTs are mostly
limited by indexes. While index lookups are cheap, they are not free and
each index needs to be updated for each row.
I fond using partial indexes to only index the rows you actually use can
help with the loading. It's a bit obscure though.
As for parallel loading, you'll be limited mostly by your I/O bandwidth.
Have you measured it to take sure it's up to speed?
> Now the details. Note that this is a test run only..
>
> Platform:- 4x Xeon2.4GHz/4GB RAM/4x48 SCSI RAID5/72 GB SCSI
> RedHat7.2/PostgreSQL7.1.3
>
> Database in flat file:
> 125,000,000 records of around 100 bytes each.
> Flat file size 12GB
>
> Load time: 14581 sec/~8600 rows persec/~ an MB of data per sec.
> Create unique composite index on 2 char and a timestamp field: 25226 sec.
> Database size on disk: 26GB
> Select query: 1.5 sec. for approx. 150 rows.
So you're loading at a rate of 860KB per sec. That's not too fast. How many
indexes are active at that time? Triggers and foreign keys also take their
toll.
> Important postgresql.conf settings
>
> sort_mem = 12000
> shared_buffers = 24000
> fsync=true (Sad but true. Left untouched.. Will that make a difference on
> SCSI?)
> wal_buffers = 65536
> wal_files = 64
fsync IIRC only affects the WAL buffers now but it may be quite expensive,
especially considering it's running on every transaction commit. Oh, your
WAL files are on a seperate disk from the data?
> Initial flat data load: 250GB of data. This has gone up since last query. It
> was 150GB earlier..
> Ongoing inserts: 5000/sec.
> Number of queries: 4800 queries/hour
> Query response time: 10 sec.
That looks quite acheivable.
> 1) Instead of copying from a single 12GB data file, will a parallel copy from
> say 5 files will speed up the things?
Limited by I/O bandwidth. On linux vmstat can tell you how many blocks are
being loaded and stored per second. Try it. As long as sync() doesn't get
done too often, it should be help.
> Couple MB of data per sec. to disk is just not saturating it. It's a RAID 5
> setup..
No, it's not. You should be able to do better.
> 2) Sort mem.=12K i.e. 94MB, sounds good enough to me. Does this need further
> addition to improve create index performance?
Should be fine. Admittedly your indexes are taking rather long to build.
> 3) 5K concurrent inserts with an index on, will this need a additional CPU
> power? Like deploying it on dual RISC CPUs etc?
It shouldn't. Do you have an idea of what your CPU usage is? ps aux should
give you a decent idea.
> 4) Query performance is not a problem. Though 4.8K queries per sec. expected
> response time from each query is 10 sec. But my guess is some serius CPU power
> will be chewed there too..
Should be fine.
> 5)Will upgrading to 7.2.2/7.3 beta help?
Possibly, though it may be wirth it just for the features/bugfixes.
> All in all, in the test, we didn't see the performance where hardware is
> saturated to it's limits. So effectively we are not able to get postgresql
> making use of it. Just pushing WAL and shared buffers does not seem to be the
> solution.
>
> If you guys have any suggestions. let me know. I need them all..
Find the bottleneck: CPU, I/O or memory?
> Mysql is almost out because it's creating index for last 17 hours. I don't
> think it will keep up with 5K inserts per sec. with index. SAP DB is under
> evaluation too. But postgresql is most favourite as of now because it works. So
> I need to come up with solutions to problems that will occur in near future..
> ;-)
17 hours! Ouch. Either way, you should be able to do much better. Hope this
helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2002-09-26 09:13:20 | Re: Performance while loading data and indexing |
Previous Message | Shridhar Daithankar | 2002-09-26 08:54:02 | Re: [HACKERS] Performance while loading data and indexing |
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2002-09-26 09:13:20 | Re: Performance while loading data and indexing |
Previous Message | Shridhar Daithankar | 2002-09-26 08:54:02 | Re: [HACKERS] Performance while loading data and indexing |
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2002-09-26 09:13:20 | Re: Performance while loading data and indexing |
Previous Message | Shridhar Daithankar | 2002-09-26 08:54:02 | Re: [HACKERS] Performance while loading data and indexing |