Re: table versioning approach (not auditing)

From: Adam Brusselback <adambrusselback(at)gmail(dot)com>
To: Felix Kunde <felix-kunde(at)gmx(dot)de>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: table versioning approach (not auditing)
Date: 2014-09-30 19:16:09
Message-ID: CAMjNa7e40syeaHz-91AKhikWw6ctx7pAg8REQEk9CdqD_sxKSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Felix, I'd love to see a single, well maintained project. For example, I
just found yours, and gave it a shot today after seeing this post. I found
a bug when an update command is issued, but the old and new values are all
the same. The trigger will blow up. I've got a fix for that, but if we
had one project that more than a handful of people used, stuff like that
would be quashed very quickly.

I love the design of it by the way. Any idea what it will take to move to
JSONB for 9.4?

On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde <felix-kunde(at)gmx(dot)de> wrote:

> Hey
>
> yes i'm adding an additional key to each of my tables. First i wanted to
> use the primary key as one column in my audit_log table, but in some of my
> tables the PK consists of more than one column. Plus it's nice to have one
> key that is called the same over all tables.
>
> To get a former state for one row at date x I need to join the latest
> delta BEFORE date x with each delta AFTER date x. If I would log complete
> rows, this joining part would not be neccessary, but as I usually work with
> spatial databases that have complex geometries and also image files, this
> strategy is too harddisk consuming.
>
> If there are more users following a similar approach, I wonder why we not
> throw all the good ideas together, to have one solution that is tested,
> maintained and improved by more developpers. This would be great.
>
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 23:25 Uhr
> Von: "Abelard Hoffman" <abelardhoffman(at)gmail(dot)com>
> An: "Felix Kunde" <felix-kunde(at)gmx(dot)de>
> Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
> Betreff: Re: [GENERAL] table versioning approach (not auditing)
>
> Thank you Felix, Gavin, and Jonathan for your responses.
>
> Felix & Jonathan: both of you mention just storing deltas. But if you do
> that, how do you associate the delta record with the original row? Where's
> the PK stored, if it wasn't part of the delta?
>
> Felix, thank you very much for the example code. I took a look at your
> table schemas. I need to study it more, but it looks like the way you're
> handling the PK, is you're adding a separate synthethic key (audit_id) to
> each table that's being versioned. And then storing that key along with the
> delta.
>
> So then to find all the versions of a given row, you just need to join the
> audit row with the schema_name.table_name.audit_id column. Is that right?
> The only potential drawback there is there's no referential integrity
> between the audit_log.audit_id and the actual table.
>
> I do like that approach very much though, in that it eliminates the need
> to interrogate the json data in order to perform most queries.
>
> AH
>
>
>
> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde <felix-kunde(at)gmx(dot)de>
> wrote:Hey
>
> i've also tried to implement a database versioning using JSON to log
> changes in tables. Here it is:
> https://github.com/fxku/audit[https://github.com/fxku/audit]
> I've got two versioning tables, one storing information about all
> transactions that happened and one where i put the JSON logs of row changes
> of each table. I'm only logging old values and not complete rows.
>
> Then I got a function that recreates a database state at a given time into
> a separate schema - either to VIEWs, MVIEWs or TABLES. This database state
> could then be indexed in order to work with it. You can also reset the
> production state to the recreated past state.
>
> Unfortunately I've got no time to further work on it at the moment + I
> have not done tests with many changes in the database so I can't say if the
> recreation process scales well. On downside I've realised is that using the
> json_agg function has limits when I've got binary data. It gets too long.
> So I'm really looking forward using JSONB.
>
> There are more plans in my mind. By having a Transaction_Log table it
> should be possible to revert only certain transactions. I'm also thinking
> of parallel versioning, e.g. different users are all working with their
> version of the database and commit their changes to the production state.
> As I've got a unique history ID for each table and each row, I should be
> able to map the affected records.
>
> Have a look and tell me what you think of it.
>
> Cheers
> Felix
>
>
> Gesendet: Montag, 29. September 2014 um 04:00 Uhr
> Von: "Abelard Hoffman" <abelardhoffman(at)gmail(dot)com>
> An: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
> Betreff: [GENERAL] table versioning approach (not auditing)
>
> Hi. I need to maintain a record of all changes to certain tables so assist
> in viewing history and reverting changes when necessary (customer service
> makes an incorrect edit, etc.).
>
> I have studied these two audit trigger examples:
>
> https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus
>
> I've also read about two other approaches to versioning:
> 1. maintain all versions in one table, with a flag to indicate which is
> the current version
> 2. have a separate versions table for each real table, and insert into the
> associated version table whenever an update or insert is done.
>
> My current implementation is based on the wiki trigger examples, using a
> single table, and a json column to record the row changes (rather than
> hstore). What I like about that, in particular, is I can have a "global,"
> chronological view of all versioned changes very easily.
>
> But there are two types of queries I need to run.
> 1. Find all changes made by a specific user
> 2. Find all changes related to a specific record
>
> #1 is simple to do. The versioning table has a user_id column of who made
> the change, so I can query on that.
>
> #2 is more difficult. I may want to fetch all changes to a group of tables
> that are all related by foreign keys (e.g., find all changes to "user"
> record 849, along with any changes to their "articles," "photos," etc.).
> All of the data is in the json column, of course, but it seems like a pain
> to try and build a query on the json column that can fetch all those
> relationships (and if I mess it up, I probably won't generate any errors,
> since the json is so free-form).
>
> So my question is, do you think using the json approach is wrong for this
> case? Does it seem better to have separate versioning tables associated
> with each real table? Or another approach?
>
> Thanks
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dev Kumkar 2014-09-30 19:32:04 Re: [SQL] pg_multixact issues
Previous Message Alvaro Herrera 2014-09-30 15:20:13 Re: [SQL] pg_multixact issues