From: | Mike Bresnahan <mike(dot)bresnahan(at)bestbuy(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Amazon EC2 CPU Utilization |
Date: | 2010-01-28 22:45:45 |
Message-ID: | loom.20100128T233020-512@post.gmane.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-general |
Greg Smith <greg <at> 2ndquadrant.com> writes:
> Looks to me like you're running into a general memory bandwidth issue
> here, possibly one that's made a bit worse by how pgbench works. It's a
> somewhat funky workload Linux systems aren't always happy with, although
> one of your tests had the right configuration to sidestep the worst of
> the problems there. I don't see any evidence that pgbench itself is a
> likely suspect for the issue, but it does shuffle a lot of things around
> in memory relative to transaction time when running this small
> select-only test, and clients can get stuck waiting for it when that
> happens.
>
> To put your results in perspective, I would expect to get around 25K TPS
> running the pgbench setup/test you're doing on a recent 4-core/single
> processor system, and around 50K TPS is normal for an 8-core server
> doing this type of test. And those numbers are extremely sensitive to
> the speed of the underlying RAM even with the CPU staying the same.
>
> I would characterize your results as "getting about 1/2 of the
> CPU+memory performance of an install on a dedicated 8-core system".
> That's not horrible, as long as you have reasonable expectations here,
> which is really the case for any virtualized install I think. I'd
> actually like to launch a more thorough investigation into this
> particular area, exactly how the PostgreSQL bottlenecks shift around on
> EC2 compared to similar dedicated hardware, if I found a sponsor for it
> one day. A bit too much work to do it right just for fun.
I can understand that I will not get as much performance out of a EC2 instance
as a dedicated server, but I don't understand why top(1) is showing 50% CPU
utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU
utilization? Does the kernel really do a context shift when waiting for response
from RAM? That would surprise me, because to do a context shift it might need to
read from RAM, which would then also block. I still worry it is a lock
contention or scheduling problem, but I am not sure how to diagnose it. I've
seen some references to using dtrace to analyze PostgreSQL locks, but it looks
like it might take a lot of ramp up time for me to learn how to use dtrace.
Note that I can peg the CPU by running 8 infinite loops inside or outside the
database. I have only seen the utilization problem when running queries (with
pgbench and my application) against PostgreSQL.
In any case, assuming this is a EC2 memory speed thing, it is going to be
difficult to diagnose application bottlenecks when I cannot rely on top(1)
reporting meaningful CPU stats.
Thank you for your help.
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2010-01-28 23:01:27 | Re: Amazon EC2 CPU Utilization |
Previous Message | Greg Smith | 2010-01-28 22:05:29 | Re: Amazon EC2 CPU Utilization |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2010-01-28 23:01:27 | Re: Amazon EC2 CPU Utilization |
Previous Message | Tom Lane | 2010-01-28 22:36:22 | Re: SQL question re aggregates & joins |