A Comparison Of 4 Databases --------------------------- Intro ----- This paper shows the results of an evaluation of 4 databases. I am posting it to this mail group as I think Postgresql emerged quite favourably. The evaluated databases were Oracle, Informix, Mysql and Postgresql. Features and performance were examined. Hardware And Operating System ----------------------------- 2 x HP Vertra VE 7 each with 160M RAM + 1x3.2G + 1x13G Quantum IDE Drives were used. Redhat 6.0 was used as the operating system. No kernel changes were made. Initial Experiences ------------------- Mysql was obtained in rpm format and was amazingly easy to install. The installation process created and started a database. The version was 3.22.27 Documentation was supplied and was good. Postgresql was similarly elementary to install, and again a database was created and started. The product comes with the flavour of Linux used and was in rpm format. The version was 6.5.2 Documentation was supplied and was very good. Informix was more cryptic to install. It was obtained in rpm format and installed. However this merely installed an archive and the real installation process had to be run thereafter. After this it had to be divined as to what was required next - the install does not create a database. Using some of the (not supplied) documentation it was discovered how to create and configure a database. The version was 7.30 ( This is old, but all they are supplying on this platform - 9.x is current) Documentation was not supplied, it was available on the Informix web site. It is ok. Oracle was difficult to judge as the author frequently installs it. However pretending to be coming to it new, it would be very difficult to install. It does not come in rpm format. It is downloadable from the Oracle web site. The small amount of included documentation is sufficient to enable someone to work out how to start the installer program. This program is a rudimentary wizard that asks questions and presents a list of components to choose.... a newcomer would suffer confusion here. The installer can create a database as part of the install. The version was 8.0.5 (this is slightly old - 8.1.5 is Current but buggy, 8.0.5 is the latest stable release on this platform). Documentation is not supplied, it is available from the Oracle web site. It is ok. Tests And results ----------------- Database Feature Comparison Database Cost Trans Row Const Program Sec Fail Hot actions Lock raints mable ure Safe back Mysql 0 /3000 No No No Partial Yes No No Postgresql 0 Yes Yes Partial Yes Yes Yes No Oracle 3200 Yes Yes Yes Yes Yes Yes Yes Informix 2000 Yes No Yes Yes Yes No No Cost NZ$ for 10 user license. Two prices mean that the product is charged for if resold as part of an application ( Mysql ) Support is not included Transactions Commit, rollback, isolation levels of at least read commited Row Locking select for update that locks only the rows selected and does not block reads Constraints primary and foreign key, with ability to enable/ disable or drop / add existence will give ""Partial"" and enable etc will give "Yes" Programmable create trigger, procedural language extensions to SQL A "Partial" means that external functions only are supported Secure Requires username and password validation Fail Safe delete of 100 out of 100000 rows, commit ( if have it ) then power off and on, check after power on - 999900 rows gets a "Yes" Hot Backup Can a consistent backup of the database be run backup while it is up with transactions performed against it. Database Performance Comparison - Data Warehouse Database Bulk Load Summarize 3% Summarize 0.3% Summarize 0.3% 1M of 3M of 3M of 3M 1 dim 1 fact 2 dim 1 fact 3 dim 1 fact Mysql 20 14 90 57 Postgresql 420 16 4 7 Oracle 65 5 3 3 Informix 170 8 5 6 Notes Bulk Load elapsed seconds to load 1000000 rows via whatever bulk load tool is supplied. Summarize 3% 1 dim + 1 fact Measure elapsed seconds to display summarized count and one timelike dimension attribute for basic star query Access path can be via an index scan Summarize 0.3% 2 dim + 1 fact Measure elapsed seconds to display summarized count and one non timelike dimesion attribute for less basic star query Access path can be via index scans Summarize 0.3% 3 dim + 1 fact Measure elapsed seconds to display a summarized count and 1 non timelike dimension attribute for more complex star query Access path can be via index scans Database Performance Comparison - Online Database Create Create Query 1 Query 4 Query Summary 10K 1M of 10K of 1M 10% of 1M Mysql 7 891 0 0 1 Postgresql 21 4430 0 0 2 Oracle 31 3694 0 0 2 Informix 0 0 10 Database Tpm Tpm Tpm Tpm 1 sess. 2 sess. 4 sess. 8 sess. Mysql 59/58 59/115 59/229 58/425 Postgresql 48 90 169 233 Oracle 55 108 180 291 Informix Notes Unfortunately no Informix SDK was available, so there is no result for the first two or last tests. Create 10K elapsed seconds for Php app to insert 10000 rows one by one Create 1M elapsed seconds for Php app to insert 1000000 rows one by one Query 1 0 seconds means that no delay was measurable Measure elapsed seconds to show master record 1 row located 70% though the table. Access path can be via an index scan Query 4 Measure elapsed seconds to display 4 detail rows for one of the master records in previous test. Access path can be via index scans Query Summary Measure elapsed seconds to display a summarized sum and count for 5% of 1000000 rows. This is a 2 table join . Access path can be via index scans Tpm n sessions Transactions per minute for n sessions, achievable for lightweight Perl DBI application. Two results for Mysql refer to two possible locking schemes Analysis -------- Features Oracle clearly has the most features. However it is Postgresql that is next in line. The features fail safe, transactions and security are seen as essential, with row locking as highly desirable. This means that at face value only Oracle and Postgresql satisfy the criteria. Initially Postgresql did not force password entry and thus was not secure, however it was discovered that access to the database is configured at install time as trusted and has to be changed to require passwords. It should be noted that Oracle needed a patch to be fail safe, and that this was not advertised on the web site, thus is possible that Informix and Mysql can also be similarly patched. Repeated tests with Informix page level locks showed that readers were blocked at this point. This was viewed as extremely undesirable. Mysql's lack of transactions was a killer for it, the complexity of having to programmatically undo an n-table operation was viewed as prohibitive. ( It was noted that the Mysql to do list had atomic operations - pseudo transactions, on it .) In conclusion on the features front, Oracle and Postgresql emerge as the leaders. Performance 1 Online operations Initially Postgreql refused to use indexes to help with queries. A mail to one of the Postgresql Development Team traced this to an unusual pickiness with respect to the implied datatypes of constants in the where clause. Informix had problems with the summary query, it would not go faster that 10s, however it seemed to perform ok for the 10000 and 1000000 query searches. Mysql demonstrates the overhead associated with multiple versioning and transactions. It was obviously the fastest raw speed terms. It was in general 4-5 times faster than anything else. Postgresql was in general about 20% slower than Oracle. The multiple session transaction test is most interesting. The idea was to mimic a Tpc A style banking application with some user think time during the transaction (1 second ). Clearly Mysql with its table level locks could not scale at all. However it has application locks that are nameable, so can mimic row locks - as long as every access uses them. If these were turned on then a dramatic turnaround was observed. Thus Mysql is fastest, and most scalable if applications locks can be used. Oracle and Postgreql are similar in scalability, but Postgresql begins to tail off a little before Oracle does. 2 Data warehousing operations Mysql has a very fast bulk load facility. Oracle also has a very fast bulk load. Infomix has a reasonably fast one. Postgresql is noticeably slower than any of the others. Mysql performed extremely poorly at the star type queries. Oracle and Informix performed very well. Postgresql performed very well in all but the first query - the simplest (ironically), but it scanned the most data. This points to Oracle and Informix having faster Nested Loop or Sort algorithms. Both Oracle and Postgresql would experience a period of poor performance on a table immediately after it was bulk loaded - typically until it was vacuumed or analyzed. In conclusion for this section Oracle, Informix are very good. Postgresql is good but is hampered by slow bulk load and sorting of large datasets. Mysql can get the data in fast but is poor at querying it. Overall ------- Informix performs mostly ok, but its locking scheme precludes it for serious on line use (it would have been interesting to test this). Oracle performs well. Mysql's lack of transactions would have been ok for data warehousing, but it could not do star queries. This rules it out. Postgresql performed about 20% slower than Oracle, except for bulk loads and large dataset sorts. These things mean that it is suitable for data warehousing and on line operations, with the proviso that large data loads and some large summary queries may be unpleasantly slow. Appendix ( database schemas and source for programs ) Schema and queries for on line tests /* ------------------------------------------------------------------- file : createupdattables.sql does : create tables for updat example... notes : master has 10000 rows detail has 1000000 rows 100 for each master row ------------------------------------------------------------------*/ create table master ( mas_id int4, mas_name varchar(100), mas_amt float8 ); create table detail ( det_id int4, det_mas_id int4, det_date datetime, det_amt float8 ); create table masterlog ( log_mas_id int4, log_date datetime); /* ----------------------------------------------------------------- file : createupdatindexes.sql does : create indexes and keys for tables... ----------------------------------------------------------------- */ create unique index mas_pk on master using btree (mas_id); create index det_pk on detail using btree (det_id); create index det_fk on detail using btree (det_mas_id); /* ----------------------------------------------------------------- file : queries1-3.sql does : the queries... ----------------------------------------------------------------- */ select * from master where mas_id = 7000; select * from detail where det_mas_id = 7000 and det_amt > 960; select sum(mas_amt),count(*) from detail,master where mas_id > 8000 and mas_id < 9000 and mas_id = det_mas_id; #!/usr/bin/perl # ----------------------------------------------------------------- # file : updat.pm # does : on line transactions test... # # notes : postgres version # ----------------------------------------------------------------- sub dbconnect() { # connect with no autocommit... # $dbh = DBI->connect('DBI:Pg:dbname=dev1',$user,$pwd,{AutoCommit => 0}) or die "Couldn't connect to database: " . DBI->errstr; } sub dbdisconnect() { # disconnect... # $dbh->disconnect; } sub dbparse() { # parse all statements here ( actually some databases do nothing here... # # master select and update... # $sthmsel = $dbh->prepare("select * from master where mas_id = ? for update") or die "Couldn't prepare statement: " . $dbh->errstr; $sthmupd = $dbh->prepare("update master set mas_amt = mas_amt + ? where mas_id = ?") or die "Couldn't prepare statement: " . $dbh->errstr; # detail select and update... # $sthdsel = $dbh->prepare("select * from detail where det_mas_id = ? and det_date > '1999-12-31' for update") or die "Couldn't prepare statement: " . $dbh->errstr; $sthdupd = $dbh->prepare("update detail set det_amt = det_amt + ? where det_mas_id = ? and det_id =?") or die "Couldn't prepare statement: " . $dbh->errstr; # log insert... # $sthlins = $dbh->prepare("insert into masterlog values(?,datetime 'now')") or die "Couldn't prepare statement: " . $dbh->errstr; } sub dbfree() { # free master and detail and log statements... # $sthmsel->finish; $sthmupd->finish; $sthdsel->finish; $sthdupd->finish; $sthlins->finish; } sub putlog() { # execute the log insert... # $sthlins->execute($masid); } sub getdetail() { # execute and fetch detail query... # $sthdsel->execute($masid); my @data = $sthdsel->fetchrow_array(); my $detid = $data[0]; # execute detail update... # $sthdupd->execute($amt,$masid,$detid); } sub getmaster() { # execute and fetch master query... # $sthmsel->execute($masid); my @data = $sthmsel->fetchrow_array(); # user typing allowamce # if ( $sleep == 0 ) { sleep(1); } # get and update the detail... # getdetail; # execute master update... # $sthmupd->execute($amt,$masid); # log the transaction... # putlog; # commit... # my $rc = $dbh->commit or die $dbh->errstr; } sub init() { use DBI; # two variables are "on" if set to zero # $debug = 1; $sleep = 0; $updatcnt = 0; $updatlim = 100; $amt = 10; $maslim = 10000; $masid = 0; $begin = 0; $end = 0; $elapsed = 0; $user = "updat"; $pwd = "updat"; $db = "dev1"; printf(" beginning $updatlim transactions...\n"); $begin = time; # connect ... # dbconnect; # parse all statements... # dbparse; # loop as required... # while ( $updatcnt < $updatlim ) { # get a random master id... # $masid = int ( rand $maslim ) + 1; if ( $debug == 0 ) { printf(" processing master $masid \n"); } getmaster; $updatcnt = $updatcnt + 1; } # calculate stats... # $end = time; $elapsed = $end - $begin; printf(" tranasctions = $updatlim\n"); printf(" elapsed = $elapsed\n"); # free all statement handles... # dbfree; # disconnect... # dbdisconnect; } init; Schema and queries for warehouse tests /* ----------------------------------------------------------------- file : createwaretables.sql does : create tables for ware example... notes : dimension tables (dim0-2) have 900, 10000 and 1000 rows resp. fact table (fact0) has 3000000 rows. ---------------------------------------------------------------- */ create table dim0 ( d0key int4, d0f1 datetime, d0f2 int4, d0f3 int4, d0f4 int4 ); create table dim1 ( d1key int4, d1f1 varchar(20), d1f2 varchar(20) ); create table dim2 ( d2key int4, d2f1 varchar(20), d2f2 varchar(20) ); create table fact0 ( d0key int4, d1key int4, d2key int4, f1 int4, f2 int4 ); /* ----------------------------------------------------------------- file : createwareindexes.sql does : create indexes for ware example... ----------------------------------------------------------------- */ create unique index dim0_pk on dim0 using btree (d0key); create unique index dim1_pk on dim1 using btree (d1key); create unique index dim2_pk on dim2 using btree (d2key); create index fact0_q1 on fact0 using btree (d0key); create index fact0_q2 on fact0 using btree (d0key,d1key); create index fact0_q3 on fact0 using btree (d0key,d1key,d2key); /* ----------------------------------------------------------------- file : queries1-3.sql does : star queries for warehouse... notes : query 1 scans 100000 rows from fact0 and 30 rows from dim0 query 2 scans 10000 rows from fact0 and 30 rows from dim0 and 10 rows from dim2 query 3 scans 10000 rows from fact0 and 30 rows from dim0 and 10 rows from dim2 and 100 rows from dim1 ----------------------------------------------------------------- */ select d0.d0f1, count(f.f1) from dim0 d0, fact0 f where d0.d0key = f.d0key and d0.d0f1 between '1996-05-01' and '1996-05-31' group by d0.d0f1 ; select d0.d0f1, d2.d2f2, count(f.f1) from dim0 d0, dim2 d2, fact0 f where d0.d0key = f.d0key and d2.d2key = f.d2key and d2.d2f2 = '90th att type' and d0.d0f1 between '1996-05-01' and '1996-05-31' group by d0.d0f1,d2.d2f2 ; select d0.d0f1, d1.d1f2, d2.d2f2, count(f.f1) from dim0 d0, dim1 d1, dim2 d2, fact0 f where d0.d0key = f.d0key and d1.d1key = f.d1key and d2.d2key = f.d2key and d2.d2f2 between '80th att type' and '90th att type' and d1.d1f2 between '10th att type' and '50th att type' and d0.d0f1 between '1996-05-01' and '1996-05-31' group by d0.d0f1,d1.d1f2,d2.d2f2 ; Scripts to populate tables Data Warehouse #!/usr/bin/perl # -------------------------------------------------------------------------- # file : popdim0.pm # does : populate the dim0 timelike dimension... # -------------------------------------------------------------------------- sub adddim() { # parse and execute the insert... # $sthins = $dbh->prepare("insert into dim0 values (?,('1999-05-01'::datetime + ?)::datetime,date_part('DAY',('1999-05-01'::datetime + ?)::datetime),date_part('MONTH',('1999-05-01'::datetime + ?)::datetime) )") or die " could not prepare \n"; $sthins->execute($dimcnt,$dimcnt,$dimcnt,$dimcnt) or die " failed to execute for $dimcnt\n"; } sub attachdb() { # connect to the database... # $dbh = DBI->connect($datasource, $user, $pwd,{ AutoCommit => 0 }) or die "cannot connect to $datasource\n"; } sub unattachdb() { # commit and disconnect... # $dbh->commit; $dbh->disconnect(); } sub init() { # setup and insert $dimlim rows for the timelike # dimension... # # use DBI; $datasource = "DBI:Pg:dbname=dev1"; $user = "ware"; $pwd = "ware"; $dbh = 0; $sthins = 0; $dimlim = 900; $dimcnt = 0; attachdb(); while ( $dimcnt < $dimlim ) { if ( $dimcnt%100 == 0 ) { printf("processing ...$dimcnt\n"); } adddim(); $dimcnt = $dimcnt + 1; } unattachdb(); } init(); #!/usr/bin/perl # -------------------------------------------------------------------------- # file : popdim1.pm # does : populate the dim1 dimension... # -------------------------------------------------------------------------- sub adddim() { # parse and execute the insert... # $sthins = $dbh->prepare("insert into dim1 values (?,? ||'th attribute',?%100 ||'th att type' )") or die " could not prepare \n"; $sthins->execute($dimcnt,$dimcnt,$dimcnt) or die " failed to execute for $dimcnt\n"; } sub attachdb() { # connect to the database... # $dbh = DBI->connect($datasource, $user, $pwd,{ AutoCommit => 0 }) or die "cannot connect to $datasource\n"; } sub unattachdb() { # commit and disconnect... # $dbh->commit; $dbh->disconnect(); } sub init() { # setup and insert $dimlim rows for the timelike # dimension... # # use DBI; $datasource = "DBI:Pg:dbname=dev1"; $user = "ware"; $pwd = "ware"; $dbh = 0; $sthins = 0; $dimlim = 10000; $dimcnt = 0; attachdb(); while ( $dimcnt < $dimlim ) { if ( $dimcnt%1000 == 0 ) { printf("processing ...$dimcnt\n"); } adddim(); $dimcnt = $dimcnt + 1; } unattachdb(); } init(); #!/usr/bin/perl # -------------------------------------------------------------------------- # file : popdim2.pm # does : populate the dim2 dimension... # -------------------------------------------------------------------------- sub adddim() { # parse and execute the insert... # $sthins = $dbh->prepare("insert into dim2 values (?,? ||'th attribute',?%100 ||'th att type' )") or die " could not prepare \n"; $sthins->execute($dimcnt,$dimcnt,$dimcnt) or die " failed to execute for $dimcnt\n"; } sub attachdb() { # connect to the database... # $dbh = DBI->connect($datasource, $user, $pwd,{ AutoCommit => 0 }) or die "cannot connect to $datasource\n"; } sub unattachdb() { # commit and disconnect... # $dbh->commit; $dbh->disconnect(); } sub init() { # setup and insert $dimlim rows for the timelike # dimension... # # use DBI; $datasource = "DBI:Pg:dbname=dev1"; $user = "ware"; $pwd = "ware"; $dbh = 0; $sthins = 0; $dimlim = 1000; $dimcnt = 0; attachdb(); while ( $dimcnt < $dimlim ) { if ( $dimcnt%100 == 0 ) { printf("processing ...$dimcnt\n"); } adddim(); $dimcnt = $dimcnt + 1; } unattachdb(); } init(); #!/usr/bin/perl # -------------------------------------------------------------------------- # file : popfact0.pm # does : populate the fact0 fact... # -------------------------------------------------------------------------- sub addfact() { # parse and execute the insert... # $sthins = $dbh->prepare("insert into fact0 values (?,?,?,100,100)") or die " could not prepare \n"; $sthins->execute($d0key,$d1key,$d2key) or die " failed to execute for $factcnt\n"; } sub attachdb() { # connect to the database... # $dbh = DBI->connect($datasource, $user, $pwd,{ AutoCommit => 0 }) or die "cannot connect to $datasource\n"; } sub unattachdb() { # disconnect... # $dbh->disconnect(); } sub commitdb() { # commit... # $dbh->commit; } sub init() { # setup and insert $dimlim rows for the timelike # dimension... # # use DBI; $datasource = "DBI:Pg:dbname=dev1"; $user = "ware"; $pwd = "ware"; $dbh = 0; $sthins = 0; # set up to add $dim0lim * $dim1lim * $dim2lim rows... # $dim0lim = 30; $dim1lim = 1000; $dim2lim = 100; $dim0cnt = 1; $dim1cnt = 1; $dim2cnt = 1; $factcnt = 1; $begin = time(); $now = 0; $elapsed = 0; attachdb(); # do all the loops... # while ( $dim0cnt <= $dim0lim ) { $d0key = 30 * $dim0cnt; while ( $dim1cnt <= $dim1lim ) { $d1key = 10 * $dim1cnt; while ( $dim2cnt <= $dim2lim ) { $d2key = 10 * $dim2cnt; if ( $factcnt%10000 == 0 ) { $now = time(); $elapsed = ($now - $begin)/ 60; printf("processed ...$factcnt in $elapsed min\n"); commitdb(); } addfact(); $factcnt = $factcnt + 1; $dim2cnt = $dim2cnt + 1; } $dim1cnt = $dim1cnt + 1; $dim2cnt = 1; } $dim0cnt = $dim0cnt + 1; $dim1cnt = 1; $dim2cnt = 1; } commitdb(); unattachdb(); } init(); More scripts to populate tables On line tables <% include("posglobals.phtm"); %> Example Database - Master

Populate Master on <%printf("%s",$dbstring)%>


<% /* connect to the database and if successful then up a cursor for data do not show errors if logon fails... */ $con = pg_pconnect("dbname=" . $dbstring . " user=" . $PHP_AUTH_USER . " password=" . $PHP_AUTH_PW); if ( $con > 0 ) { $rows = 0; $rowlim = 10000; $begin = time(); /* start the transaction... */ $cur = pg_exec($con,"begin"); /* insert the rows... */ for ( $rows = 0; $rows < $rowlim; $rows++ ) { $cur = pg_exec($con,"insert into master values (" . $rows . "," . "'master " . $rows . "',100)"); } /* commit the transaction... */ $cur = pg_exec($con,"commit"); pg_freeresult($cur); $end = time(); if ( $rows == $rowlim) { $res = "successful"; } else { $res = "not successful"; } /* set up the table for formatting data... */ printf("
"); printf(""); printf(""); printf(""); printf("",$res); printf(""); printf(""); printf(""); printf("",$rows); printf(""); printf(""); printf(""); printf("",$end - $begin); printf(""); printf("
VariableValue
result%s
rows%d
elapsed%d
"); pg_close($con); } else { printf("
Unable to connect to the database

"); } %>

<% include("postail.phtm"); %> <% include("posglobals.phtm"); %> Example Database - Master

Populate Detail on <%printf("%s",$dbstring)%>


<% /* connect to the database and if successful then up a cursor for data do not show errors if logon fails... */ $con = @pg_pconnect("dbname=" . $dbstring . " user=" . $PHP_AUTH_USER . " password=" . $PHP_AUTH_PW); if ( $con > 0 ) { $masrowlim =10000; $detrowlim = 100; $cnt = 0; $begin = time(); for ( $masrows = 1; $masrows < $masrowlim + 1; $masrows++ ) { for ( $detrows = 1; $detrows < $detrowlim + 1; $detrows++ ) { /* start a transaction.... */ if ( $cnt == 0 ) { $cur = pg_exec($con,"begin"); } /* do an insert and increemnt the counter.... */ $cur = pg_exec($con,"insert into detail values (" . $masrows * $detrows . ",". $masrows ."," . "datetime 'now'+ " . $detrows . "," . 10 * $detrows .")"); $cnt++; /* if done enough the commit and reset the counter and begin another one... */ if ( $cnt == 10000 ) { $cnt = 1; $cur = pg_exec($con,"commit"); $cur = pg_exec($con,"begin"); } } } /* commit at the end... */ $cur = pg_exec($con,"commit"); pg_freeresult($cur); $end = time(); $masrows--; $detrows--; if ( $masrows == $masrowlim) { $res = "successful"; } else { $res = "not successful"; } /* set up the table for formatting data... */ printf("
"); printf(""); printf(""); printf(""); printf("",$res); printf(""); printf(""); printf(""); printf("",$masrows * $detrows); printf(""); printf(""); printf(""); printf("",$end - $begin); printf(""); printf("
VariableValue
result%s
rows%d
elapsed%d
"); pg_close($con); } else { printf("
Unable to connect to the database

"); } %>

<% include("postail.phtm"); %>