Re: Slow count(*) again...

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-12 16:23:06
Message-ID: AANLkTinwiEWnTLNJ7e++KdxynQyt4p2_s7Vq_PQtYbZ8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey <scott(at)richrelevance(dot)com>wrote:
>
>
> However, for large reporting queries and sequential scans, XFS will win in
> the long run if you use the online defragmenter. Otherwise, your sequential
> scans won't be all that sequential on any file system over time if your
> tables aren't written once, forever, serially. Parallel restore will
> result in a system that is fragmented -- ext4 will do best at limiting this
> on the restore, but only xfs has online defragmentation. We schedule ours
> daily and it noticeably improves sequential scan I/O.
>
>
Our reporting tables are written sequentially and left unmodified until
entire partitions are dropped. However, equivalent partitions tend to get a
little bit larger over time, so newer partitions won't necessarily fit into
the gaps left by prior partition drops, so it is possible that partitions
will be split into two sections, but should still be very sequential, if not
perfectly so. It would seem that we stumbled into an ideal architecture for
doing this kind of work - mostly by virtue of starting with 8.2.x and having
huge problems with autovacuum and vacuum taking forever and dragging the db
to halt, which caused us to move to an architecture which aggregates and
then drops older data in entire partitions instead of updating aggregates
individually and then deleting rows. Partitions are sized such that most
reporting queries run over entire partitions, too (which was completely
accidental since I had not yet delved into individual query optimization at
the time), so even though we are doing sequential scans, we at least run as
few of them as possible and are able to keep hot data in memory.

--sam

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-10-12 16:23:56 Re: [JDBC] Support for JDBC setQueryTimeout, et al.
Previous Message Chris Browne 2010-10-12 16:03:57 Re: Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2010-10-12 16:35:46 Re: Slow count(*) again...
Previous Message Chris Browne 2010-10-12 16:13:09 Re: large dataset with write vs read clients