Re: [SQL] Yet Another (Simple) Case of Index not used

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Date: 2003-04-21 16:14:43
Message-ID: 200304210914.43552.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance pgsql-sql

Kevin,

> > Josh already mentioned this as a problem with user-trigger-based
> > counting.
>
> Right, but the trigger based mechanism probably magnifies the issue by
> orders of magnitude, and thus can't necessarily be used as an argument
> against an internally-implemented method.

I'm not sure about that, Kevin. The production trigger test was written in C
(by Joe Conway), using some of the best memory/efficiency management he could
devise. I could buy that the trigger mechanism adds a certain fixed
overhead to the process, but not the contention that we were seeing ...
especially not the geometric progression of inefficiency as the transaction
count went up. We'll talk about this offlist; I may be able to get the
client to authorize letting you examine the database.

For further detail, our setup was sort of a "destruction test"; including:
1) a slightly underpowered server running too many processes;
2) a very high disk contention environment, with multiple applications
fighting for I/O.
3) running COUNT(*), GROUP BY x on a table with 1.4 million rows, which was
being updated in batches of 10,000 rows to 40,000 rows every few minutes.

As I said before, the overhead for c-trigger based accounting, within the MVCC
framework, was quite tolerable with small update batches, only 9-11% penalty
to the updates overall for batches of 100-300 updates. However, as we
increased the application activity, the update penalty increased, up to
40-45% with the full production load.

It's not hard to figure out why; like most user's servers, the aggregate
caching table was on the same disk as the table(s) being updated. The
resut was a huge amount of disk-head-skipping between the updated table and
the aggregate caching table every time a commit hit the database, with random
seek times increasing the longer the time since the last VACUUM.

Now, on a better server with these tables on fast RAID or on different
spindles, I expect the result would be somewhat better. However, I also
suspect that many of the users who complain the loudest about slow count(*)
are operating in single-spindle environments.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lamar Owen 2003-04-21 16:18:57 Re: Using TEXT columns for binary content
Previous Message Murthy Kambhampaty 2003-04-21 16:05:12 Re: stddev returns 0 when there is one row

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2003-04-22 08:23:39 Re: [SQL] Yet Another (Simple) Case of Index not used
Previous Message Tom Lane 2003-04-21 01:53:20 Re: [SQL] Yet Another (Simple) Case of Index not used

Browse pgsql-sql by date

  From Date Subject
Next Message Mario Alberto Soto Cordones 2003-04-21 20:48:59 functions problems
Previous Message Lorraine Dewey 2003-04-21 15:14:04 Optomizing left outer joins