| From: | Ralph Graulich <maillist(at)shauny(dot)de> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | SQL logic for version history including table | 
| Date: | 2002-08-03 09:43:41 | 
| Message-ID: | Pine.LNX.4.21.0208031126310.1995-100000@shauny.shauny.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hi,
I've got a table which includes complete versioning and history for each
entry. The fields in question are:
- createdate (date of creation of this row)
- vno (version number, increased by 1 for each new version)
- active ('Y' for newest version, 'N' for all older ones, simply set to
'N' for all versions, if a row is deleted)
- dam_id (key value, but not unique, same value for each
version of an entry)
I want to select the following data:
(1) All the changes, which were made since a specific date in the past,
ordered by their date of creation descending and a name field in ascending
way.
(2) As I want to set a special mark in the output if that was a new entry
or just the change of an old existing entry, I need to check the version
number for each date of creation.
(3) As there can be multiple versions on one day, I need to check for the
lowest version number on each date of creation to see, wether it really
was a change or a new entry.
(4) Finally I need to exclude all the entries, if they don't have a valid
entry on the current day any more, which means they were deleted after the
date the last entry was made -> short: only entrys which have a active='Y'
entry are allowed to be selected.
Thought quite a while about it and came to the following conclusion:
SELECT	d.field1,
	d.field2, ...
	d.createdate,
	(SELECT		d4.createdate
	FROM		dam d4
	WHERE		d.dam_id = d4.dam_id
	ORDER BY	createdate ASC LIMIT 1) AS firstdate
FROM	dam d
WHERE	createdate >= '1999-06-01'
AND	vno=(SELECT	d2.vno
	FROM		dam d2
	WHERE		d2.dam_id=d.dam_id
	AND		d2.createdate=d.createdate
	ORDER BY	vno ASC LIMIT 1)
AND EXISTS
	(SELECT		*
	FROM		dam d3
	WHERE		d3.dam_id=d.dam_id
	AND		d3.active=d.active
	AND		d.active='Y')
ORDER BY
	createdate DESC,
	dam ASC
	LIMIT 200;
The explain plan looks like:
NOTICE: QUERY PLAN:
Limit  (cost=137625.77..137625.77 rows=200 width=73)
  ->  Sort  (cost=137625.77..137625.77 rows=395 width=73)
        ->  Seq Scan on dam d  (cost=0.00..137608.74 rows=395 width=73)
              SubPlan
                ->  Limit  (cost=7.24..7.24 rows=1 width=4)
                      ->  Sort  (cost=7.24..7.24 rows=1 width=4)
                            ->  Index Scan using ix_dam_dam_id on dam d4
(cost=0.00..7.23 rows=1 width=4)
                ->  Limit  (cost=7.24..7.24 rows=1 width=4)
                      ->  Sort  (cost=7.24..7.24 rows=1 width=4)
                            ->  Index Scan using ix_dam_dam_id on dam d2
(cost=0.00..7.23 rows=1 width=4)
                ->  Result  (cost=0.00..7.23 rows=1 width=287)
                      ->  Index Scan using ix_dam_dam_id on dam d3
(cost=0.00..7.23 rows=1 width=287)
Maybe someone wants to discuss about that approach with me or wants to
point out possible errors. All opinions are welcome.
Kind regards
... Ralph ...
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Cédric Dufour | 2002-08-03 09:50:50 | Re: b1 OR b2 <-> ( CASE WHEN b1 THE true ELSE b2 END ): performance bottleneck on logical OR | 
| Previous Message | HT&T | 2002-08-03 08:55:16 | Re: MySQL or Postgres ? |