Skip site navigation (1) Skip section navigation (2)

Re: [SQL] Yet Another (Simple) Case of Index not used

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Brown <kevin(at)sysexperts(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Date: 2003-04-19 19:03:18
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-generalpgsql-performancepgsql-sql
Kevin, Tom:

> (The cheapness can be disputed as well, since it creates a single point
> of contention for all inserts and deletes on the table.  But that's a
> different topic.)

Actually, this was the problem with the trigger method of maintaining COUNT 
information in PostgreSQL.   The statistics table itself becomes a 
significant souce of delay, since if a table_A gets 10,000 rows updated than 
table_count_A must necessarily be updated 10,000 times ... creating a lot of 
dead tuples and severely attenuating the table on disk until the next vacuum 
... resulting in Update #10,000 to table_count_A taking 100+ times as long as 
Update #1 does, due to the required random seek time on disk.

I can personally think of two ways around this:

In MySQL: store table_count_A as a non-MVCC table or global variable.  
Drawback: the count would not be accurate, as you would see changes due to 
incomplete transactions and eventually the count would be knocked off 
completely by an overload of multi-user activity.  However, this does fit 
with MySQL's design philosophy of "Speed over accuracy", so I suspect that 
that's what they're doing.

In PostgreSQL:
a) Put table_count_A on superfast media like a RAM card so that random seeks 
after 10,000 updates do not become a significant delay;
b) create an asynchronious table aggregates collector which would collect 
programmed statistics (like count(*) from table A) much in the same way that 
the planner statistics collector does.  This would have the disadvantage of 
on being up to date when the database is idle, but the advantage of not 
imposing any significant overhead on Updates.
	(Incidentally, I proposed this to one of my clients who complained about 
Postgres' slow aggregate performance, but they declined to fund the effort)

Josh Berkus
Aglio Database Solutions
San Francisco

In response to


pgsql-performance by date

Next:From: Tom LaneDate: 2003-04-19 20:26:49
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Previous:From: Stephan SzaboDate: 2003-04-19 19:03:02
Subject: Re: [PERFORM] Foreign key performance

pgsql-sql by date

Next:From: Tom LaneDate: 2003-04-19 20:26:49
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Previous:From: Rajesh Kumar MallahDate: 2003-04-19 16:42:56
Subject: Re: replicable problem with PL/Perl

pgsql-general by date

Next:From: Joe ConwayDate: 2003-04-19 19:50:01
Subject: Re: stddev returns 0 when there is one row
Previous:From: Oliver ElphickDate: 2003-04-19 18:59:51
Subject: Re: Please some help on a join question with sum

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group