Re: how could select id=xx so slow?

From: Yan Chunlu <springrider(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject: Re: how could select id=xx so slow?
Date: 2012-07-11 16:35:59
Message-ID: CAOA66tFRfXGFvNaU2hvBq1d285QeiFcJR-0jru+bS=TBJ=nZ6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

huge thanks for the patient explanations, I think you are right, it is
really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100%
frequently, postgresql is the only service running on that machine, so I
think it is either checkpoint or queries caused the problem.

and I agree that checkpoint may not the problem, I guess I need to tackle
those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables
has tens of millions records. could that cause the query heavy disk IO?
when should I split the data to other machines(aka sharding)?

and you are right the machine has 16GB memory and commodity 500GB disk.
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011
x86_64 GNU/Linux

by "new kernel" which version do you mean?

and about those IO intensive queries, I can only tell the time used from
slow query log, is there anything like "explain analyze" that shows
specific information about IO usage?

On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:

> On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> > I have logged one day data and found the checkpoint is rather
> > frequently(detail: https://gist.github.com/3088338). Not sure if it is
> > normal, but the average time of checkpoint is about 100sec~200sec, it
> seems
> > related with my settings:
> >
> > 574 checkpoint_segments = 64
> > 575 wal_keep_segments = 5000
> >
> > I set checkpoint_segments as a very large value which is because
> otherwise
> > the slave server always can not follow the master, should I lower that
> > value?
> >
> > or the slow query is about something else? thanks!
>
> Some things to notice from the checkpoints log:
> * All chcekpoints are triggered by checkpoint_timeout, using up only a
> couple log files
> * Checkpoints write out around 40MB of buffers
> * The write out period is spread out nicely like it's supposed to but
> the sync phase is occasionally taking a very long time (more than 2
> minutes)
>
> This looks like something (not necessarily the checkpoint sync itself)
> is overloading the IO system. You might want to monitor the IO load
> with iostat and correlate it with the checkpoints and slow queries to
> find the culprit. It's also possible that something else is causing
> the issues.
>
> If the cause is checkpoints, just making them less frequent might make
> the problem worse. I'm assuming you have 16GB+ of RAM because you have
> 4GB of shared_buffers. Just making checkpoint_timeout longer will
> accumulate a larger number of dirty buffers that will clog up the IO
> queues even worse. If you are on Linux, lowering
> dirty_expire_centisecs or dirty_background_bytes might help to spread
> the load out but will make overall throughput worse.
>
> On the otherhand, if the I/O overload is from queries (more likely
> because some checkpoints sync quickly) there are no easy tuning
> answers. Making queries less IO intensive is probably the best you can
> do. From the tuning side, newer Linux kernels handle I/O fairness a
> lot better, and you could also try tweaking the I/O scheduler to
> achieve better throughput to avoid congestion or at least provide
> better latency for trivial queries. And of course its always possible
> to throw more hardware at the problem and upgrade the I/O subsystem.
>
> Ants Aasma
> --
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Midge Brown 2012-07-11 17:25:26 Re: moving tables
Previous Message ktm@rice.edu 2012-07-11 14:19:54 Re: DELETE vs TRUNCATE explanation