Skip site navigation (1) Skip section navigation (2)

Re: Thoughts on statistics for continuously advancing columns

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Thoughts on statistics for continuously advancing columns
Date: 2009-12-31 01:40:09
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Joshua D. Drake wrote:
> postgres=# analyze verbose test_ten_million;
> INFO:  analyzing "public.test_ten_million"
> INFO:  "test_ten_million": scanned 3000 of 44248 pages, containing 678000
> live rows and 0 dead rows; 3000 rows in sample, 10000048 estimated total
> rows
> Time: 20145.148 ms

At an ever larger table sizes, this would turn into 3000 random seeks 
all over the drive, one at a time because there's no async I/O here to 
queue requests better than that for this access pattern.  Let's say they 
take 10ms each, not an unrealistic amount of time on current hardware.  
That's 30 seconds, best case, which is similar to what JD's example is 
showing even on a pretty small data set.  Under load it could easily 
take over a minute, hammering the disks the whole time, and in a TOAST 
situation you're doing even more work.  It's not outrageous and it 
doesn't scale linearly with table size, but it's not something you want 
to happen any more than you have to either--consider the poor client who 
is trying to get their work done while that is going on.

On smaller tables, you're both more likely to grab a useful next page 
via readahead, and to just have the data you need cached in RAM 
already.  There's a couple of "shelves" in the response time to finish 
ANALYZE as you exceed L1/L2 CPU cache size and RAM size, then it trails 
downward as the seeks get longer and longer once the data you need is 
spread further across the disk(s).  That the logical beginning of a 
drive is much faster than the logical end doesn't help either.  I should 
generate that graph again one day somewhere I can release it at...

Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support

In response to

pgsql-hackers by date

Next:From: Tatsuo IshiiDate: 2009-12-31 01:48:48
Subject: Re: exec_execute_message crash
Previous:From: David E. WheelerDate: 2009-12-31 00:41:42
Subject: Re: Status of plperl inter-sp calling

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group