Re: [SQL] Yet Another (Simple) Case of Index not used

From: Kevin Brown <kevin(at)sysexperts(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Date: 2003-04-21 00:46:30
Message-ID: 20030421004630.GL1847@filer
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance pgsql-sql

Tom Lane wrote:
> Kevin Brown <kevin(at)sysexperts(dot)com> writes:
> > This is why I suspect the best way to manage this would be to manage
> > the counter itself using the MVCC mechanism (that is, you treat the
> > shared counter as a row in a table just like any other and, in fact,
> > it might be most beneficial for it to actually be exactly that), which
> > handles the visibility problem automatically. But I don't know how
> > much contention there would be as a result.
>
> Hm. Contention probably wouldn't be the killer, since if transactions
> don't try to update the count until they are about to commit, they won't
> be holding the row lock for long. (You'd have to beware of deadlocks
> between transactions that need to update multiple counters, but that
> seems soluble.) What *would* be a problem is that such counter tables
> would accumulate huge numbers of dead rows very quickly, making it
> inefficient to find the live row.

But that inefficiency is a problem for *all* oft-updated tables, is it
not? I know that you'll end up with an additional n tuples per
transaction (where n is the average number of tables inserted into or
deleted from per transaction), so this isn't an insignificant problem,
but it's one faced by any application that often updates a small
table.

Causing a transaction which is already doing inserts/deletes to take
the hit of doing one additional update doesn't seem to me to be a
particularly large sacrifice, especially since the table it's updating
(the one that contains the counts) is likely to be cached in its
entirety. The chances are reasonable that the other activity the
transaction is performing will dwarf the additional effort that
maintaining the count demands.

> Josh already mentioned this as a problem with user-trigger-based
> counting.

Right, but the trigger based mechanism probably magnifies the issue by
orders of magnitude, and thus can't necessarily be used as an argument
against an internally-implemented method.

> You could stanch the bleeding with sufficiently frequent vacuums,
> perhaps, but it just doesn't look very appealing.

I would say this is more a strong argument for automatic VACUUM
management than against count management, because what you say here is
true of any oft-updated, oft-referenced table.

> Ultimately what this comes down to is "how much overhead are we willing
> to load onto all other operations in order to make SELECT-COUNT(*)-with-
> no-WHERE-clause fast"? Postgres has made a set of design choices that
> favor the other operations. If you've designed an application that
> lives or dies by fast COUNT(*), perhaps you should choose another
> database.

Or perhaps a mechanism similar to the one being discussed should be
implemented and controlled with a GUC variable, so instead of forcing
someone to choose another database you force them to choose between
the performance tradeoffs involved. We already give DBAs such choices
elsewhere, e.g. pg_stat_activity.

The real question in all this is whether or not fast COUNT(*)
operations are needed often enough to even justify implementing a
mechanism to make them possible in PG. The question of getting fast
answers from COUNT(*) comes up often enough to be a FAQ, and that
suggests that there's enough demand for the feature that it may be
worth implementing just to shut those asking for it up. :-)

Personally, I'd rather see such development effort go towards more
beneficial improvements, such as replication, 2PC, SQL/MED, etc. (or
even improving the efficiency of MVCC, since it was mentioned here as
a problem! :-). I consider COUNT(*) without a WHERE clause to be a
corner case, despite the frequency of questions about it. But I don't
think we should reject a patch to implement fast COUNT(*) just because
it represents a performance tradeoff, at least if it's GUC-controlled.

--
Kevin Brown kevin(at)sysexperts(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Douglas Trainor 2003-04-21 01:00:16 Re: stddev returns 0 when there is one row
Previous Message Tom Lane 2003-04-20 23:50:06 Re: stddev returns 0 when there is one row

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-04-21 01:53:20 Re: [SQL] Yet Another (Simple) Case of Index not used
Previous Message Tom Lane 2003-04-21 00:35:00 Re: [PERFORM] Foreign key performance

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2003-04-21 01:53:20 Re: [SQL] Yet Another (Simple) Case of Index not used
Previous Message Tom Lane 2003-04-20 22:29:57 Re: replicable problem with PL/Perl