Re: TB-sized databases

From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject: Re: TB-sized databases
Date: 2007-12-07 17:45:08
Message-ID: 200712071245.09352.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday 06 December 2007 04:38, Simon Riggs wrote:
> Robert,
>
> On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote:
> > If the whole performance of your system depends upon indexed access, then
> > maybe you need a database that gives you a way to force index access at
> > the query level?
>
> That sounds like a request for hints, which is OT here, ISTM.
>

If you want to eat peas, and someone suggests you use a knife, can I only
argue the validity of using a knife? I'd rather just recommend a spoon.

> > I think you're completly overlooking the effect of disk latency has on
> > query times. We run queries all the time that can vary from 4 hours to
> > 12 hours in time based solely on the amount of concurrent load on the
> > system, even though they always plan with the same cost.
>
> Not at all. If we had statement_cost_limit then it would be applied
> after planning and before execution begins. The limit would be based
> upon the planner's estimate, not the likely actual execution time.
>

This is nice, but it doesnt prevent "slow queries" reliably (which seemed to
be in the original complaints), since query time cannot be directly traced
back to statement cost.

> So yes a query may vary in execution time by a large factor as you
> suggest, and it would be difficult to set the proposed parameter
> accurately. However, the same is also true of statement_timeout, which
> we currently support, so I don't see this point as an blocker.
>
> Which leaves us at the burning question: Would you use such a facility,
> or would the difficulty in setting it exactly prevent you from using it
> for real?

I'm not sure. My personal instincts are that the solution is too fuzzy for me
to rely on, and if it isnt reliable, it's not a good solution. If you look at
all of the things people seem to think this will solve, I think I can raise
an alternative option that would be a more definitive solution:

"prevent queries from taking longer than x" -> statement_timeout.

"prevent planner from switching to bad plan" -> hint system

"prevent query from consuming too many resources" -> true resource
restrictions at the database level

I'm not so much against the idea of a statement cost limit, but I think we
need to realize that it does not really solve as many problems as people
think, in cases where it will help it often will do so poorly, and that there
are probably better solutions available to those problems. Of course if you
back me into a corner I'll agree a poor solution is better than no solution,
so...

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message kelvan 2007-12-07 19:13:36 Re: database tuning
Previous Message Erik Jones 2007-12-07 17:33:16 Re: Cost-Based Vacuum Delay tuning