Re: count(*) performance improvement ideas

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: count(*) performance improvement ideas
Date: 2008-03-20 07:00:42
Message-ID: 2e78013d0803200000l34e306a3j515447017b58a58d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
<Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:

>
> Pavan also refers to deferred triggers, which has got me thinking about another possible solution:
>
> Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named after the txid and the grouping value (which is an integer in my case, perhaps hash it if you're grouping by something that doesn't easily convert to part of a table name),
> create an after insert initially deferred constraint to call a function which will move the 'at commit' values of the rows in the temp table to the summary table.
> The temp table will only have one row inserted, updated many times, then on commit the trigger is fired once, and the temp table is dropped.
>
> Does anyone think this will or won't work for some reason?

I think this should work, although you may need to take some extra steps
to manage the summary table. Also, I think a single temp table per transaction
should suffice. The temp table would have one row per "group by" or "where"
condition on which you want to track the count. The corresponding row will
be updated as and when the corresponding count changes. You would need
INSERT/DELETE/UPDATE triggers to do that. If there are any subtransaction
aborts, that will be taken care by MVCC.

As you said, a single deferred trigger would then merge the temp table with
the summary table. Here we need to be extra careful because serializable
transactions may fail to update the same row in the summary table. One solution
I can think of is (and I haven't checked the archives, so somebody might have
already suggested this before):

Each summary table will have one summary row per "group by" or "where" condition
(same as temp table). In addition to that, it can have zero or more temporary
rows for the conditions.

- select summary_rows from summary table for update nowait;
- if lock is not available, insert our deltas into the summary table as a new
row. These rows will be deleted as soon as some other transaction gets lock
on the summary rows and merge our deltas with them
- if lock is available
- merge our deltas with the summary rows
- check for other temporary deltas and merge them with the summary rows
and delete those temporary rows

I guess we can write the generic triggers as contrib module. What needs to done
is to let user specify the tables and the conditions on which they want to track
count(*) and then apply those conditions in the generic triggers.

Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martin Pihlak 2008-03-20 08:17:10 Re: stored procedure stats in collector
Previous Message NikhilS 2008-03-20 06:49:49 Re: Problem identifying constraints which should not be inherited