Re: TB-sized databases

From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject: Re: TB-sized databases
Date: 2007-12-05 20:07:21
Message-ID: 200712051507.22745.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday 29 November 2007 11:14, Simon Riggs wrote:
> On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > >> Tom's previous concerns were along the lines of "How would know what
> > >> to set it to?", given that the planner costs are mostly arbitrary
> > >> numbers.
> > >
> > > Hm, that's only kind of true.
> >
> > The units are not the problem. The problem is that you are staking
> > non-failure of your application on the planner's estimates being
> > pretty well in line with reality. Not merely in line enough that
> > it picks a reasonably cheap plan, but in line enough that if it
> > thinks plan A is 10x more expensive than plan B, then the actual
> > ratio is indeed somewhere near 10.
> >
> > Given that this list spends all day every day discussing cases where the
> > planner is wrong, I'd have to think that that's a bet I wouldn't take.
>
> I think you have a point, but the alternative is often much worse.
>
> If an SQL statement fails because of too high cost, we can investigate
> the problem and re-submit. If a website slows down because somebody
> allowed a very large query to execute then everybody is affected, not
> just the person who ran the bad query. Either way the guy that ran the
> query loses, but without constraints in place one guy can kill everybody
> else also.
>
> > You could probably avoid this risk by setting the cutoff at something
> > like 100 or 1000 times what you really want to tolerate, but how
> > useful is it then?
>
> Still fairly useful, as long as we understand its a blunt instrument.
>
> If the whole performance of your system depends upon indexed access then
> rogue queries can have disastrous, unpredictable consequences. Many
> sites construct their SQL dynamically, so a mistake in a seldom used
> code path can allow killer queries through. Even the best DBAs have been
> known to make mistakes.
>

If the whole performance of your system depends upon indexed access, then
maybe you need a database that gives you a way to force index access at the
query level?

> e.g. An 80GB table has 8 million blocks in it.
> - So putting a statement_cost limit = 1 million would allow some fairly
> large queries but prevent anything that did a SeqScan (or worse).
> - Setting it 10 million is going to prevent things like sorting the
> whole table without a LIMIT
> - Setting it at 100 million is going to prevent unconstrained product
> joins etc..

I think you're completly overlooking the effect of disk latency has on query
times. We run queries all the time that can vary from 4 hours to 12 hours in
time based solely on the amount of concurrent load on the system, even though
they always plan with the same cost.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2007-12-05 20:43:44 Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)
Previous Message Bill Moran 2007-12-05 18:25:44 Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)