Re: count(*) performance improvement ideas

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: count(*) performance improvement ideas
Date: 2008-04-16 23:19:49
Message-ID: F0238EBA67824444BC1CB4700960CB48051D5FCC@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> >> As for 2) and 3), can't you look into the pg_settings view?
>
> > pg_settings view doesn't contain custom variables created
> on the fly,
>
> Really? [ pokes around ... ] Hm, you're right, because
> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
> usage it'll never be cleared. I wonder if we should change that.
>
> The whole thing is a bit of an abuse of what the mechanism
> was intended
> for, and so I'm not sure we should rejigger GUC's behavior to make it
> more pleasant, but on the other hand if we're not ready to provide a
> better substitute ...

In my experiments with materialized views, I identified these problems as "minor" difficulties. Resolving them would allow further abuse ;)

Aside: It is currently more cumbersome to get a function to run, if needed, at commit. Ideal solution would be something like "EXECUTE ON COMMIT my_function()" or maybe "SAVEPOINT my_name ON COMMIT my_function()", but these suggestions are made without investigating what provision the SQL standard has made to address this need.

My use of mv.initialized means I can create variables when initializing a transaction, and afterwards know that they have values, but what I can't easily do is use those variables to identify which grouping keys have been updated. To do that I select & conditionally insert to a table for that explicit purpose. If select doesn't find the key, then I create variables named after that key, with zero values.

Performance and efficiency-wise.... which would be better way of keeping track of grouping keys used in a transaction?:
1) Create a temp table, on commit drop, for the transaction, storing grouping keys affected.
2) Use a persistent table, storing txid and grouping keys affected, deleting txid rows at commit.
3) Use pg_settings, storing tx local grouping keys affected, existence check via catching an exception, listing via checking existence for all possible values (a possibility in my scenario).

Speed is my priority, low disk IO is a probable means to that end, which is why I investigated using variables.

Basically, (3) isn't a viable option, so what are the trade-offs between creating a temporary table per transaction, or using rows in a permanent table with a txid column?

Here are some more plpgsql code fragments:

mv := 'mv.' || view_name || '.' || key_value || '.';

When recording a grouping key as being affected by the transaction, create the variables with zeroes:

PERFORM set_config(mv||'documents', '0', true);
PERFORM set_config(mv||'last_addition', 'null', true);

In an insert trigger:

PERFORM set_config(mv||'documents', (current_setting(mv||'documents')::bigint + 1)::text, true);
PERFORM set_config(mv||'last_addition', now()::text, true);

In the defferred till commit trigger:

UPDATE materialized_view set
documents=documents+current_setting(mv||'documents')::bigint,
last_addition=greatest(last_addition,nullif(current_setting(mv||'last_addition'),'null')::timestamp)
where
group_id = key_values.key_value;

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2008-04-16 23:21:35 Re: [HACKERS] Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout
Previous Message Bruce Momjian 2008-04-16 23:08:46 Re: Lessons from commit fest