Skip site navigation (1) Skip section navigation (2)

Re: TB-sized databases

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: TB-sized databases
Date: 2007-11-29 16:14:26
Message-ID: 1196352866.4246.1304.camel@ebony.site (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> >> Tom's previous concerns were along the lines of "How would know what to
> >> set it to?", given that the planner costs are mostly arbitrary numbers.
> 
> > Hm, that's only kind of true.
> 
> The units are not the problem.  The problem is that you are staking
> non-failure of your application on the planner's estimates being
> pretty well in line with reality.  Not merely in line enough that
> it picks a reasonably cheap plan, but in line enough that if it
> thinks plan A is 10x more expensive than plan B, then the actual
> ratio is indeed somewhere near 10.
> 
> Given that this list spends all day every day discussing cases where the
> planner is wrong, I'd have to think that that's a bet I wouldn't take.

I think you have a point, but the alternative is often much worse. 

If an SQL statement fails because of too high cost, we can investigate
the problem and re-submit. If a website slows down because somebody
allowed a very large query to execute then everybody is affected, not
just the person who ran the bad query. Either way the guy that ran the
query loses, but without constraints in place one guy can kill everybody
else also.

> You could probably avoid this risk by setting the cutoff at something
> like 100 or 1000 times what you really want to tolerate, but how
> useful is it then?

Still fairly useful, as long as we understand its a blunt instrument.

If the whole performance of your system depends upon indexed access then
rogue queries can have disastrous, unpredictable consequences. Many
sites construct their SQL dynamically, so a mistake in a seldom used
code path can allow killer queries through. Even the best DBAs have been
known to make mistakes.

e.g. An 80GB table has 8 million blocks in it. 
- So putting a statement_cost limit = 1 million would allow some fairly
large queries but prevent anything that did a SeqScan (or worse).
- Setting it 10 million is going to prevent things like sorting the
whole table without a LIMIT
- Setting it at 100 million is going to prevent unconstrained product
joins etc..

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


In response to

Responses

pgsql-performance by date

Next:From: Brad NicholsonDate: 2007-11-29 16:18:39
Subject: Re: 7.4 Checkpoint Question
Previous:From: Simon RiggsDate: 2007-11-29 16:14:21
Subject: Re: 7.4 Checkpoint Question

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group