Skip site navigation (1) Skip section navigation (2)

Re: TB-sized databases

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject: Re: TB-sized databases
Date: 2007-12-06 09:38:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance

On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote:

> If the whole performance of your system depends upon indexed access, then 
> maybe you need a database that gives you a way to force index access at the 
> query level? 

That sounds like a request for hints, which is OT here, ISTM.

The issue is that if somebody issues a "large query" then it will be a
problem whichever plan the query takes. Forcing index scans can make a
plan more expensive than a seq scan in many cases.

> > e.g. An 80GB table has 8 million blocks in it.
> > - So putting a statement_cost limit = 1 million would allow some fairly
> > large queries but prevent anything that did a SeqScan (or worse).
> > - Setting it 10 million is going to prevent things like sorting the
> > whole table without a LIMIT
> > - Setting it at 100 million is going to prevent unconstrained product
> > joins etc..
> I think you're completly overlooking the effect of disk latency has on query 
> times.  We run queries all the time that can vary from 4 hours to 12 hours in 
> time based solely on the amount of concurrent load on the system, even though 
> they always plan with the same cost.  

Not at all. If we had statement_cost_limit then it would be applied
after planning and before execution begins. The limit would be based
upon the planner's estimate, not the likely actual execution time. 

So yes a query may vary in execution time by a large factor as you
suggest, and it would be difficult to set the proposed parameter
accurately. However, the same is also true of statement_timeout, which
we currently support, so I don't see this point as an blocker.

Which leaves us at the burning question: Would you use such a facility,
or would the difficulty in setting it exactly prevent you from using it
for real?

  Simon Riggs

In response to


pgsql-performance by date

Next:From: Michael StoneDate: 2007-12-06 15:42:14
Subject: Re: TB-sized databases
Previous:From: Dave PageDate: 2007-12-06 08:50:56
Subject: Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group