Quick Links

SQL logic for version history including table

From:	Ralph Graulich <maillist(at)shauny(dot)de>
To:	pgsql-general(at)postgresql(dot)org
Subject:	SQL logic for version history including table
Date:	2002-08-03 09:43:41
Message-ID:	Pine.LNX.4.21.0208031126310.1995-100000@shauny.shauny.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I've got a table which includes complete versioning and history for each
entry. The fields in question are:

- createdate (date of creation of this row)
- vno (version number, increased by 1 for each new version)
- active ('Y' for newest version, 'N' for all older ones, simply set to
'N' for all versions, if a row is deleted)
- dam_id (key value, but not unique, same value for each
version of an entry)

I want to select the following data:

(1) All the changes, which were made since a specific date in the past,
ordered by their date of creation descending and a name field in ascending
way.

(2) As I want to set a special mark in the output if that was a new entry
or just the change of an old existing entry, I need to check the version
number for each date of creation.

(3) As there can be multiple versions on one day, I need to check for the
lowest version number on each date of creation to see, wether it really
was a change or a new entry.

(4) Finally I need to exclude all the entries, if they don't have a valid
entry on the current day any more, which means they were deleted after the
date the last entry was made -> short: only entrys which have a active='Y'
entry are allowed to be selected.

Thought quite a while about it and came to the following conclusion:

SELECT d.field1,
d.field2, ...
d.createdate,
(SELECT d4.createdate
FROM dam d4
WHERE d.dam_id = d4.dam_id
ORDER BY createdate ASC LIMIT 1) AS firstdate
FROM dam d
WHERE createdate >= '1999-06-01'
AND vno=(SELECT d2.vno
FROM dam d2
WHERE d2.dam_id=d.dam_id
AND d2.createdate=d.createdate
ORDER BY vno ASC LIMIT 1)
AND EXISTS
(SELECT *
FROM dam d3
WHERE d3.dam_id=d.dam_id
AND d3.active=d.active
AND d.active='Y')
ORDER BY
createdate DESC,
dam ASC
LIMIT 200;

The explain plan looks like:

NOTICE: QUERY PLAN:

Limit (cost=137625.77..137625.77 rows=200 width=73)
-> Sort (cost=137625.77..137625.77 rows=395 width=73)
-> Seq Scan on dam d (cost=0.00..137608.74 rows=395 width=73)
SubPlan
-> Limit (cost=7.24..7.24 rows=1 width=4)
-> Sort (cost=7.24..7.24 rows=1 width=4)
-> Index Scan using ix_dam_dam_id on dam d4
(cost=0.00..7.23 rows=1 width=4)
-> Limit (cost=7.24..7.24 rows=1 width=4)
-> Sort (cost=7.24..7.24 rows=1 width=4)
-> Index Scan using ix_dam_dam_id on dam d2
(cost=0.00..7.23 rows=1 width=4)
-> Result (cost=0.00..7.23 rows=1 width=287)
-> Index Scan using ix_dam_dam_id on dam d3
(cost=0.00..7.23 rows=1 width=287)

Maybe someone wants to discuss about that approach with me or wants to
point out possible errors. All opinions are welcome.

Kind regards
... Ralph ...

Browse pgsql-general by date

	From	Date	Subject
Next Message	Cédric Dufour	2002-08-03 09:50:50	Re: b1 OR b2 <-> ( CASE WHEN b1 THE true ELSE b2 END ): performance bottleneck on logical OR
Previous Message	HT&T	2002-08-03 08:55:16	Re: MySQL or Postgres ?