Re: Large databases, performance

From: "Charles H(dot) Woloszynski" <chw(at)clearmetrix(dot)com>
To: shridhar_daithankar(at)persistent(dot)co(dot)in
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-general <pgsql-general(at)postgresql(dot)org>, "pankaj M(dot) Tolani" <pankaj(at)pspl(dot)co(dot)in>
Subject: Re: Large databases, performance
Date: 2002-10-03 12:54:29
Message-ID: 3D9C3E05.7070906@clearmetrix.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-performance pgsql-sql

Can you comment on the tools you are using to do the insertions (Perl,
Java?) and the distribution of data (all random, all static), and the
transaction scope (all inserts in one transaction, each insert as a
single transaction, some group of inserts as a transaction).

I'd be curious what happens when you submit more queries than you have
processors (you had four concurrent queries and four CPUs), if you care
to run any additional tests. Also, I'd report the query time in
absolute (like you did) and also in 'Time/number of concurrent queries".
This will give you a sense of how the system is scaling as the workload
increases. Personally I am more concerned about this aspect than the
load time, since I am going to guess that this is where all the time is
spent.

Was the original posting on GENERAL or HACKERS. Is this moving the
PERFORMANCE for follow-up? I'd like to follow this discussion and want
to know if I should join another group?

Thanks,

Charlie

P.S. Anyone want to comment on their expectation for 'commercial'
databases handling this load? I know that we cannot speak about
specific performance metrics on some products (licensing restrictions)
but I'd be curious if folks have seen some of the databases out there
handle these dataset sizes and respond resonably.

Shridhar Daithankar wrote:

>Hi,
>
>Today we concluded test for database performance. Attached are results and the
>schema, for those who have missed earlier discussion on this.
>
>We have (almost) decided that we will partition the data across machines. The
>theme is, after every some short interval a burst of data will be entered in
>new table in database, indexed and vacuume. The table(s) will be inherited so
>that query on base table will fetch results from all the children. The
>application has to consolidate all the data per node basis. If the database is
>not postgresql, app. has to consolidate data across partitions as well.
>
>Now we need to investigate whether selecting on base table to include children
>would use indexes created on children table.
>
>It's estimated that when entire data is gathered, total number of children
>tables would be around 1K-1.1K across all machines.
>
>This is in point of average rate of data insertion i.e. 5K records/sec and
>total data size, estimated to be 9 billion rows max i.e. estimated database
>size is 900GB. Obviously it's impossible to keep insertion rate on an indexed
>table high as data grows. So partitioning/inheritance looks better approach.
>
>Postgresql is not the final winner as yet. Mysql is in close range. I will keep
>you guys posted about the result.
>
>Let me know about any comments..
>
>Bye
> Shridhar
>
>--
>Price's Advice: It's all a game -- play it to have fun.
>
>
>
>
>------------------------------------------------------------------------
>
>Machine
>Compaq Proliant Server ML 530
>"Intel Xeon 2.4 Ghz Processor x 4, "
>"4 GB RAM, 5 x 72.8 GB SCSI HDD "
>"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0"
>"Cost - $13,500 ($1,350 for each additional 72GB HDD)"
>
>Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2
> WITHOUT InnoDB WITH InnoDB for with built-in support
> for transactional transactional support for transactions
> support
>Complete Data
>
>Inserts + building a composite index
>"40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs
>"about 100 bytes each, schema on
>'schema' sheet"
>"composite index on 3 fields
>(esn, min, datetime)"
>
>Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second
>
>Database Size on Disk 48 GB 87 GB 111 GB
>
>Average per partition
>
>Inserts + building a composite index
>"300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs
>"about 100 bytes each, schema on
>'schema' sheet"
>"composite index on 3 fields
>(esn, min, datetime)"
>
>Select Query 7 secs 7 secs 6 secs
>based on equality match of 2 fields
>(esn and min) - 4 concurrent queries
>running
>
>Database Size on Disk 341 MB 619 MB 788 MB
>
>
>------------------------------------------------------------------------
>
>Field Name Field Type Nullable Indexed
>type int no no
>esn char (10) no yes
>min char (10) no yes
>datetime timestamp no yes
>opc0 char (3) no no
>opc1 char (3) no no
>opc2 char (3) no no
>dpc0 char (3) no no
>dpc1 char (3) no no
>dpc2 char (3) no no
>npa char (3) no no
>nxx char (3) no no
>rest char (4) no no
>field0 int yes no
>field1 char (4) yes no
>field2 int yes no
>field3 char (4) yes no
>field4 int yes no
>field5 char (4) yes no
>field6 int yes no
>field7 char (4) yes no
>field8 int yes no
>field9 char (4) yes no
>
>
>
>------------------------------------------------------------------------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>

--

Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nigel J. Andrews 2002-10-03 12:56:03 Re: Large databases, performance
Previous Message Shridhar Daithankar 2002-10-03 12:36:10 Large databases, performance

Browse pgsql-hackers by date

  From Date Subject
Next Message Nigel J. Andrews 2002-10-03 12:56:03 Re: Large databases, performance
Previous Message Shridhar Daithankar 2002-10-03 12:36:10 Large databases, performance

Browse pgsql-performance by date

  From Date Subject
Next Message Nigel J. Andrews 2002-10-03 12:56:03 Re: Large databases, performance
Previous Message Shridhar Daithankar 2002-10-03 12:36:10 Large databases, performance

Browse pgsql-sql by date

  From Date Subject
Next Message Nigel J. Andrews 2002-10-03 12:56:03 Re: Large databases, performance
Previous Message Shridhar Daithankar 2002-10-03 12:36:10 Large databases, performance