Re: TB-sized databases

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: TB-sized databases
Date: 2007-11-29 16:14:26
Message-ID: 1196352866.4246.1304.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> >> Tom's previous concerns were along the lines of "How would know what to
> >> set it to?", given that the planner costs are mostly arbitrary numbers.
>
> > Hm, that's only kind of true.
>
> The units are not the problem. The problem is that you are staking
> non-failure of your application on the planner's estimates being
> pretty well in line with reality. Not merely in line enough that
> it picks a reasonably cheap plan, but in line enough that if it
> thinks plan A is 10x more expensive than plan B, then the actual
> ratio is indeed somewhere near 10.
>
> Given that this list spends all day every day discussing cases where the
> planner is wrong, I'd have to think that that's a bet I wouldn't take.

I think you have a point, but the alternative is often much worse.

If an SQL statement fails because of too high cost, we can investigate
the problem and re-submit. If a website slows down because somebody
allowed a very large query to execute then everybody is affected, not
just the person who ran the bad query. Either way the guy that ran the
query loses, but without constraints in place one guy can kill everybody
else also.

> You could probably avoid this risk by setting the cutoff at something
> like 100 or 1000 times what you really want to tolerate, but how
> useful is it then?

Still fairly useful, as long as we understand its a blunt instrument.

If the whole performance of your system depends upon indexed access then
rogue queries can have disastrous, unpredictable consequences. Many
sites construct their SQL dynamically, so a mistake in a seldom used
code path can allow killer queries through. Even the best DBAs have been
known to make mistakes.

e.g. An 80GB table has 8 million blocks in it.
- So putting a statement_cost limit = 1 million would allow some fairly
large queries but prevent anything that did a SeqScan (or worse).
- Setting it 10 million is going to prevent things like sorting the
whole table without a LIMIT
- Setting it at 100 million is going to prevent unconstrained product
joins etc..

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brad Nicholson 2007-11-29 16:18:39 Re: 7.4 Checkpoint Question
Previous Message Simon Riggs 2007-11-29 16:14:21 Re: 7.4 Checkpoint Question