Re: count(*) performance improvement ideas

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject: Re: count(*) performance improvement ideas
Date: 2008-03-18 03:11:01
Message-ID: F0238EBA67824444BC1CB4700960CB4804EAC3A8@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I am sure this must have been discussed before.
>
> Indeed. Apparently you didn't find the threads in which the idea of
> having transactions enter "delta" entries was discussed? Solves both
> the locking and the MVCC problems, at the cost that you need to make
> cleanup passes over the counter table fairly often.

I can't find any posts that directly address what I was looking for.

In my situation I have a small number of concurrent transactions with each transaction running a large number of single row inserts or deletes.

However I'm not after a fast count(*) from table, but more like a fast
select grouping_id, count(*) from my_table group by grouping_id

I initially thought that what you meant by "having transactions enter "delta" entries" was that I have a trigger that would create a row each time it was called, writing how many records where inserted or deleted. I didn't understand how this would be much of an improvement, as each of my rows would contain either +1 or -1.

But I just realised you might mean to include a txid row in my table of deltas, and in my trigger insert or update that row where txid = txid_current()
(per grouping_id)

Is that what is recommended?

No locking problems as each transaction is only updating its own rows.

Can you clarify the lack of MVCC problems?
Do new versions of rows get created if the original version of the row was created in the current transaction?
Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations?
Any advice on creating or avoiding indexes on the tables in question?

I can think of two options for a performing the cleanup passes using current functionality:
1) Using Listen/Notify
Issue a notify whenever a new row is inserted for a transaction. They get delivered post transaction commit don't they? And not at all on rollback? Have an application listening for them, performing the aggregation & cleanup work.
2) Use a schedule, based on local knowledge of expected number of transactions over time.

So I'd rephrase Pavan's suggestion as a request to have post-transaction commit triggers that have access to (at least) the txid of the transaction that was committed.

Suggested syntax is to add the option "TRANSACTION" (or perhaps "COMMIT") to the CREATE TRIGGER statement:

CREATE TRIGGER name AFTER INSERT OR UPDATE OR DELETE ON table FOR EACH TRANSACTION EXECUTE PROCEDURE funcname ( arguments );

Any of the triggering actions on the specified table ensure that the function is called once if the transaction commits.
Requires a new TG_LEVEL.
TG_OP could be the first action triggered.

Would implementing this be extremely difficult due to transferring information from within the transaction to outside the transaction?
If so, perhaps I'd get the same benefit from having a trigger set up to fire pre-commit (or pre-prepare), and be a part of the transaction.
Would the locking difficulties be reduced as the lock would not be required till late in the game, and not held for long?

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2008-03-18 03:16:27 Re: [0/4] Proposal of SE-PostgreSQL patches
Previous Message KaiGai Kohei 2008-03-18 02:57:36 Re: [PATCHES] [0/4] Proposal of SE-PostgreSQL patches