Opteron vs. Xeon performance differences

From: Bart Grantham <bg(at)logicworks(dot)net>
To: "'pgsql-general(at)postgresql(dot)org'" <pgsql-general(at)postgresql(dot)org>
Subject: Opteron vs. Xeon performance differences
Date: 2008-10-09 21:34:22
Message-ID: E75AB101237A1842B208BDDABE741B280D29C59109@exchange4a.corp.logicworks.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Forgive me if this has been beaten into the ground, but my team and I couldn't find much conclusive study or posts on this issue. To make a long story short: we're experiencing Xeons as 50% slower than Opterons, even when the Xeon has twice as much cache and a slight clock speed advantage.

The full story: we have an older production server with 2G of RAM, 2.4GHz Opterons w/ 1M of cache. The database is not large, only around 7M or 8M rows altogether, 2.5G on disk. Most queries are reads, probably on a 10:1 proportion with writes. In the process of upgrading this server to a pair of DRBD-mirrored (more on this below) servers we discovered that the new servers were actually slower than the older one. The newer servers have 4G of RAM, 3.0GHz Xeons with 2M of cache. And not just a little slower, but queries (simple, complex, and disgusting recursive stored procedures) routinely run in 50-100% more time than they did on the older server. After many troubleshooting techniques (downgrading the kernel to that of the older machine, verifying version parity, copying the binary from the older server, building a 32bit binary on the new servers, running the entire database out of a ramdisk, and of course much tweaking of postgresql.conf) and seeing virtually no benefit from any of these tests I finally took the final leap: just pull the disks and throw them in a newer Opteron chassis (2.8GHz, 1M cache). And whaddya know? It's got a 20% speed edge on the older Opteron, and blows away the performance of the newer Xeons.

One of my guys did some testing and it appears that LWLockAquire and LWLockRelease are the culprits, but we're not entirely confident of our conclusion. Any thoughts on why this might be so different between the two architectures? We're a hosting provider so we've got some spare equipment to work with and I'm going to request that we keep these two boxes up for a week or so. Are there any other tests that you guys can suggest that would help get down to the bottom of this? I figure that not everyone has access to as much gear as we do so it might be a good opportunity to get some A/B testing on a production database on identical OS/server installs on different hardware. I'm content to just say "Well, we use Opterons then!", but I imagine that if we could help bring equal performance to Xeon users that it would be worth the effort of volunteering. To be clear, I have two machines sitting on the network ready for tweaking, one is a Xeon, the other is an Opteron, neither is in production and both can be fully mangled in the interest of figuring this out.

Speaking of being a hosting provider, I may as well take a moment to point out that we are working with DRBD for mirroring and have found it works beautifully with PG (MySQL as well). Also, while our "Managed Database Service" product is geared around MySQL, Oracle, and MSSQL, we're pretty familiar with PG and would be happy to talk to anyone about hosting needs they may have.

Thanks for listening, and again please let me know if there is further testing we can do to help get to the bottom of this Opteron/Xeon performance discrepancy.

Bart Grantham
VP of R&D
Logicworks, Inc.
www.logicworks.net

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris 2008-10-09 23:43:45 Re: when COPY violates Primary Keys
Previous Message Markus Wanner 2008-10-09 20:53:11 Re: [Pkg-postgresql-public] Postgres major version support policy on Debian