Re: TB-sized databases

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject: Re: TB-sized databases
Date: 2007-12-11 23:37:57
Message-ID: 1197416277.4255.1481.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2007-12-07 at 12:45 -0500, Robert Treat wrote:
> On Thursday 06 December 2007 04:38, Simon Riggs wrote:

> > > I think you're completly overlooking the effect of disk latency has on
> > > query times. We run queries all the time that can vary from 4 hours to
> > > 12 hours in time based solely on the amount of concurrent load on the
> > > system, even though they always plan with the same cost.
> >
> > Not at all. If we had statement_cost_limit then it would be applied
> > after planning and before execution begins. The limit would be based
> > upon the planner's estimate, not the likely actual execution time.
> >
>
> This is nice, but it doesnt prevent "slow queries" reliably (which seemed to
> be in the original complaints), since query time cannot be directly traced
> back to statement cost.

Hmm, well it can be directly traced, just not with the accuracy you
desire.

We can improve the accuracy, but then we would need to run the query
first in order to find out it was killing us.

> > So yes a query may vary in execution time by a large factor as you
> > suggest, and it would be difficult to set the proposed parameter
> > accurately. However, the same is also true of statement_timeout, which
> > we currently support, so I don't see this point as an blocker.
> >
> > Which leaves us at the burning question: Would you use such a facility,
> > or would the difficulty in setting it exactly prevent you from using it
> > for real?
>
> I'm not sure. My personal instincts are that the solution is too fuzzy for me
> to rely on, and if it isnt reliable, it's not a good solution. If you look at
> all of the things people seem to think this will solve, I think I can raise
> an alternative option that would be a more definitive solution:
>
> "prevent queries from taking longer than x" -> statement_timeout.
>
> "prevent planner from switching to bad plan" -> hint system
>
> "prevent query from consuming too many resources" -> true resource
> restrictions at the database level

I like and agree with your list, as an overview. I differ slightly on
specifics.

> I'm not so much against the idea of a statement cost limit, but I think we
> need to realize that it does not really solve as many problems as people
> think, in cases where it will help it often will do so poorly, and that there
> are probably better solutions available to those problems. Of course if you
> back me into a corner I'll agree a poor solution is better than no solution,
> so...

statement_cost_limit isn't a panacea for all performance ills, its just
one weapon in the armoury. I'm caught somewhat in that whatever I
propose as a concrete next step, somebody says I should have picked
another. Oh well.

On specific points:

With hints I prefer a declarative approach, will discuss later in
release cycle.

The true resource restrictions sound good, but its still magic numbers.
How many I/Os are allowed before you kill the query? How much CPU? Those
are still going to be guessed at. How do we tell the difference between
a random I/O and a sequential I/O - there's no difference as far as
Postgres is concerned in the buffer manager, but it can cause a huge
performance difference. Whether you use real resource limits or
statement cost limits you still need to work out the size of your table
and then guess at appropriate limits.

Every other system I've seen uses resource limits, but the big problem
is that they are applied after something has been running for a long
time. It's kinda like saying I'll treat the gangrene when it reaches my
knee. I prefer to avoid the problem before it starts to hurt at all, so
I advocate learning the lessons from other systems, not simply follow
them. But having said that, I'm not against having them; its up to the
administrator how they want to manage their database, not me.

What resource limit parameters would you choose? (temp disk space etc..)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2007-12-12 00:07:38 Re: database tuning
Previous Message kelvan 2007-12-11 23:18:42 Re: database tuning