Collating records based on a custom group by (aggregate like) function

From: Dan Searle <dan(at)adelix(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Collating records based on a custom group by (aggregate like) function
Date: 2008-03-18 14:00:10
Message-ID: 1281754331.20080318140010@adelix.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi,

I want to create a custom function/query (using the C API?) that
collates records from a record set (takes multiple records as input)
and produces a set of records (returns multiple records).

The Postgres docs only give examples of how to create a custom
function that returns multiple records (sets of tuples), but does not
show how you can create a function that can take a set of tuples as
input and scan them.

This requirement has come about because I want to scan an event log
table and collate individual log entries (group them) depending on a
custom notion of equality.

For instance, if we had the table "events":

timestamp, event
00:00:00, a
00:00:01, b
00:00:02, a
00:10:00, b
00:10:01, b

The custom function/query should output:

event, start, end
a, 00:00:00, 00:00:02
b, 00:00:01, 00:00:01
b, 00:10:00, 00:10:01

It's like grouping the events in the log table by name and timestamp,
however the timestamp grouping is not based simply on equality, but
the notion that if two events happen within 5 minutes of each other
they are collated into the same event "bucket".

Another way to look at the problem would be to create a special
grouping function in a GROUP BY clause, e.g.

SELECT name, min(timestamp), max(timestamp) FROM
events GROUP BY name, enumerate_event(name, timestamp);

Where "enumerate_event" would take the name ("a" or "b" as in the
example previously), and the timestamp for the event, then return a
unique collated event ID for each distinct event "bucket". The
problem here is that the "enumerate_event" function would need to save
it's state (a temporary list of active events) between calls.

I can't see an easy way to make a custom function using the Postgres C
API that can (for each transaction) setup an internal state,
repeatedly use the state during the execution of a query, and then
free it's state when the query finishes.

I've racked my brain about this but can't think of a simple solution,
even though this appears to be a simple problem, any suggestions much
appreciated.

Regards, Dan...

--

Dan Searle
Adelix Ltd
dan(dot)searle(at)adelix(dot)com web: www.adelix.com
tel: 0845 230 9590 / fax: 0845 230 9591 / support: 0845 230 9592
snail: The Old Post Office, Bristol Rd, Hambrook, Bristol BS16 1RY. UK.

Adelix Ltd is a registered company in England & Wales No. 4232156
VAT registration number 779 4232 91
Adelix Ltd is BS EN ISO 9001:2000 Certified (No. GB 12763)

Any views expressed in this email communication are those
of the individual sender, except where the sender specifically states
them to be the views of a member of Adelix Ltd. Adelix Ltd. does not
represent, warrant or guarantee that the integrity of this communication
has been maintained nor that the communication is free of errors or
interference.

------------------------------------------------------------------------------------
Scanned for viruses, spam and offensive content by CensorNet MailSafe

Professional Web & E-mail Filtering from www.censornet.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Erik Jones 2008-03-18 14:29:14 Re: Is autovacuum on?
Previous Message Joris Dobbelsteen 2008-03-18 12:17:01 Re: Feature request/suggestion - CREATE SCHEMA LIKE

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-03-18 14:16:13 Re: minimal update
Previous Message Alvaro Herrera 2008-03-18 13:43:40 Re: CVS problems