Re: Slow count(*) again...

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-21 23:11:22
Message-ID: CF6DEAD9-50AC-4EE4-A4CC-FA0B31D82D2C@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


On Oct 21, 2010, at 11:13 AM, Jesper Krogh wrote:

> On 2010-10-21 06:47, Scott Carey wrote:
> > On a wimpy disk, I/O bound for
> sure. But my disks go 1000MB/sec.
>
> > No query can go fast enough for them. The best I've gotten is
>
> > 800MB/sec, on a wide row (average 800 bytes). Most tables go
>
> > 300MB/sec or so. And with 72GB of RAM, many scans are in-memory
>
> > anyway.
>
>
> Is it cpu or io bound while doing it?
I/O bound with the fio benchmark tool if 16K blocks or greater, CPU bound with 8K blocks or smaller. CentOS 5.5.
CPU bound with postgres.

> Can you scan it faster using time cat relation-oid.* > /dev/null
>

I'm not sure what you mean. in psql, select * piped to /dev/null is VERY CPU bound because of all the formatting. I haven't toyed with COPY. Do you mean the actual files? 'dd' tests from actual files are similar to fio, but not as consistent and hard to add concurrency. That is faster than postgres.

>
> > A single SSD with supercapacitor will go about 500MB/sec by itself
>
> > next spring. I will easily be able to build a system with 2GB/sec
>
> > I/O for under $10k.
>
>
>
> What filesystem are you using? Readahead?
> Can you try to check the filesystemfragmentation of the table using filefrag?
>
XFS, defragmented once a day. Readahead 40960 (20MB, 1MB per spindle). two raid 10 arrays, each 10 discs each (2 hot spare), software raid-0 tying those together (md, 1MB blocks). Two Adaptec 5805 (or 5085, the external SAS one). A third raid card for the OS/xlog with 4x10krpm sas drives internal.

Fragmentation quickly takes this down a lot as do small files and concurrent activity, since its only enough spindles for ~2000 iops. But its almost all large reporting queries on partitioned tables (500,000 partitions). A few smaller tables are starting to cause too many seeks so those might end up on a smaller, high iops tablespace later.

Over time the disks have filled up and there is a significant slowdown in sequential transfer at the end of the partition -- 600MB/sec max. That is still CPU bound on most scans, but postgres can go that fast on some scans.

Off topic:
Other interesting features is how this setup causes the system tables to bloat by factors of 2x to 8x each week, and requires frequent vacuum full + reindex on several of them else they become 1.5GB in size. Nothing like lots of temp table work + hour long concurrent transactions to make the system catalog bloat. I suppose with 8.4 many temp tables could be replaced using WITH queries, but in other cases analyzing a temp table is the only way to get a sane query plan.

> --
> Jesper
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-10-21 23:33:12 Re: Serializable snapshot isolation patch
Previous Message Alvaro Herrera 2010-10-21 23:01:08 Re: crash in plancache with subtransactions

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2010-10-22 03:47:24 Re: BBU Cache vs. spindles
Previous Message Andres Freund 2010-10-21 20:18:09 Re: BBU Cache vs. spindles