Re: count(*) performance improvement ideas

From: PFC <lists(at)peufeu(dot)com>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: count(*) performance improvement ideas
Date: 2008-04-17 09:20:11
Message-ID: op.t9qz7x2ncigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne
<Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:

> PFC wrote:
>> Let's try this quick & dirty implementation of a local
>> count-delta cache
>> using a local in-memory hashtable (ie. {}).
>
>> CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
>> RETURNS INTEGER
>> AS $$
>> if key in GD:
>> GD[key] += delta
>> else:
>> GD[key] = delta
>> return GD[key]
>> $$ LANGUAGE plpythonu;
>
> Thanks for the code, this seems to be very much what I was looking for.
>
> I don't know plpythonu (nor python), just read a few docs now:

Learn Python, it is a really useful language ;)

> "The global dictionary SD is available to store data between function
> calls. This variable is private static data. The global dictionary GD is
> public data, available to all Python functions within a session. Use
> with care."
>
> Does session == transaction or connection?
> I don't understand the difference between SD and GD, private and public.
> Where are the context boundaries?

There is no sharing between processes, so
- both SD and GD are limited to the current session (connection, postgres
process), no shared memory is involved
- GD is global between all python functions (global)
- SD is specific to each python function (static)

The big gotcha is that these are all non-transactional : if you rollback,
GD and SD stay the same, and when you issue a query, you can assume the
state of SD and GD is random (due to previous queries) unless you
initialize them to a known value.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Raphaël Jacquot 2008-04-17 11:57:44 Re: Submission of Feature Request : RFC- for Implementing Transparent Data Encryption in P
Previous Message Magnus Hagander 2008-04-17 09:16:17 Re: Lessons from commit fest