Re: TB-sized databases

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject: Re: TB-sized databases
Date: 2007-12-06 09:38:16
Message-ID: 1196933896.4255.340.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robert,

On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote:

> If the whole performance of your system depends upon indexed access, then
> maybe you need a database that gives you a way to force index access at the
> query level?

That sounds like a request for hints, which is OT here, ISTM.

The issue is that if somebody issues a "large query" then it will be a
problem whichever plan the query takes. Forcing index scans can make a
plan more expensive than a seq scan in many cases.

> > e.g. An 80GB table has 8 million blocks in it.
> > - So putting a statement_cost limit = 1 million would allow some fairly
> > large queries but prevent anything that did a SeqScan (or worse).
> > - Setting it 10 million is going to prevent things like sorting the
> > whole table without a LIMIT
> > - Setting it at 100 million is going to prevent unconstrained product
> > joins etc..
>
> I think you're completly overlooking the effect of disk latency has on query
> times. We run queries all the time that can vary from 4 hours to 12 hours in
> time based solely on the amount of concurrent load on the system, even though
> they always plan with the same cost.

Not at all. If we had statement_cost_limit then it would be applied
after planning and before execution begins. The limit would be based
upon the planner's estimate, not the likely actual execution time.

So yes a query may vary in execution time by a large factor as you
suggest, and it would be difficult to set the proposed parameter
accurately. However, the same is also true of statement_timeout, which
we currently support, so I don't see this point as an blocker.

Which leaves us at the burning question: Would you use such a facility,
or would the difficulty in setting it exactly prevent you from using it
for real?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2007-12-06 15:42:14 Re: TB-sized databases
Previous Message Dave Page 2007-12-06 08:50:56 Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)