Re: performance of IN (subquery)

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: performance of IN (subquery)
Date: 2004-08-27 19:22:02
Message-ID: 87r7psmlnp.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > I'm not about to run analyze in the middle of the data generation
> > (which wouldn't work anyways since it's in a transaction).
>
> Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
> The cached-plan business is a problem, I agree, but I think it's
> orthogonal to this particular discussion (and you can always use EXECUTE
> if you have to).

It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.

This new approach would actually complete the fix, a perl or plpgsql EXECUTE
implementation would gradually shift statistics during the job.

Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.

> but separate ANALYZE could definitely make an estimate of the fraction of
> dead tuples.

With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?

> Nope, you aren't. The above seems to me to be a recipe for degradation
> of performance over time, precisely because the plans wouldn't change in
> the face of changes in the situation.

A gradual degradation is ok. A gradual degradation means I can schedule a
nightly analyze and report on any changed plans and either automatically
accept them or manually approve them individually.

A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.

As I said, it doesn't matter to me if every query is 10% slower than possible,
as long as no query takes 1000% as long as necessary even if it's a 1 in 1000
occurrence.

> I've resisted adding "planner hints" to the language for this reason, and
> I'm certainly not eager to offer any hard guarantees about plans not
> changing.

I just want to control _when_ they change. Eventually you'll come around. I
think it'll be a slow gradual change in thinking as the user-base changes
though. Not something I'll change with a single argument in one day.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Stark 2004-08-27 19:23:36 Re: R: R: space taken by a row & compressed data
Previous Message Dino Vliet 2004-08-27 19:20:45 job for sql, pl/pgsql,gawk,perl or ??