Re: count(*) performance improvement ideas

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject: Re: count(*) performance improvement ideas
Date: 2008-03-19 03:22:43
Message-ID: F0238EBA67824444BC1CB4700960CB4804EAC74C@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > What I was asking about with those questions, is if a
> > single row is inserted in transaction A, and updated 1000
> > times still within transaction A, then transaction A
> > commits... does a single row version get written, or 1001,
> > 1000 of which are removable?
>
> Umm ... AFAICS there is no need for an UPDATE to touch the count table
> at all. You'd only need ON INSERT and ON DELETE triggers.

I'm not referring to updates of my base table... the single row inserted was referring to the delta row...

I'm trying to group together a large number of +1 or -1 deltas into a single delta per transaction.

A cut down example:

CREATE TABLE document_type_summary_detail
(
document_type_id integer NOT NULL,
txid bigint NOT NULL DEFAULT 0,
documents bigint NOT NULL DEFAULT 0,
CONSTRAINT pk_document_type_summary PRIMARY KEY (document_type_id, txid)
);

CREATE OR REPLACE FUNCTION document_count_trig()
RETURNS TRIGGER AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE document_type_summary_detail set documents=documents+1 where document_type_id = NEW.document_type_id and txid=txid_current();
IF NOT FOUND THEN
INSERT INTO document_type_summary_detail (document_type_id,documents,txid) VALUES(NEW.document_type_id,1,txid_current());
END IF;
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
UPDATE document_type_summary_detail set documents=documents-1 where document_type_id = OLD.document_type_id and txid=txid_current();
IF NOT FOUND THEN
INSERT INTO document_type_summary_detail (document_type_id,documents,txid) VALUES(OLD.document_type_id,-1,txid_current());
END IF;
RETURN OLD;
END IF;
END;
$$
LANGUAGE 'plpgsql';

create trigger document_count_trig before insert or delete on document for each row execute procedure document_count_trig();

--one off setup:
insert into document_type_summary_detail (document_type_id,txid,documents)
select dt.id, 0, count(d.id) from document_type dt left join document d on d.document_type_id = dt.id
group by 1,2;

--useful view:
CREATE OR REPLACE VIEW document_type_summary AS SELECT document_type_id, sum(documents) AS documents FROM document_type_summary_detail GROUP BY document_type_id;

--scheduled cleanup:
CREATE OR REPLACE FUNCTION document_type_summary_aggregate() RETURNS void AS
$$
BEGIN
INSERT INTO document_type_summary_detail(document_type_id) select distinct document_type_id from document_type_summary_detail except select document_type_id from document_type_summary_detail where txid=0;
UPDATE document_type_summary_detail set documents=v.documents from document_type_summary as v where document_type_summary_detail.document_type_id = v.document_type_id and document_type_summary_detail.txid=0 and exists (select 1 from document_type_summary_detail ss where ss.document_type_id = document_type_summary_detail.document_type_id and ss.txid <> 0);
DELETE FROM document_type_summary_detail where txid <> 0;
END;
$$
LANGUAGE 'plpgsql';

My assumption is that this solves the "locking causes serialisation of transactions" problem as the only rows updated are those inserted by the same transaction.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Warren Turkal 2008-03-19 03:26:15 Re: timestamp datatype cleanup
Previous Message Artem Yazkov 2008-03-19 03:16:54 fast count(*) through statistics collector