Re: Versioned, chunked documents

From: Martin Gregorie <martin(at)gregorie(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Versioned, chunked documents
Date: 2012-04-01 23:22:02
Message-ID: 1333322522.1754.143.camel@zappa.gregorie.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 2012-04-02 at 00:38 +0200, Ivan Voras wrote:
> Hi,
>
> I have documents which are divided into chunks, so that the (ordered)
> concatenation of chunks make the whole document. Each of the chunks may
> be edited separately and past versions of the chunks need to be kept.
>
> The structure looks fairly simple:
>
> The first goal is to retrieve the latest version of the whole document,
> made from the latest versions of all chunks, but later the goal will
> also be to fetch the whole version at some point in time (i.e. with
> chunks created before a point in time).
>
> I did the first goal by creating two helper views:
>
> CREATE VIEW documents_chunks_last_version_chunk_ids AS
> SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY
> documents_id, seq;
>
> CREATE VIEW documents_chunks_last_version_content AS
> SELECT documents_chunks.documents_id, content
> FROM documents_chunks
> JOIN documents_chunks_last_version_chunk_ids ON
> documents_chunks.id=documents_chunks_last_version_chunk_ids.max
> ORDER BY documents_chunks_last_version_chunk_ids.seq;
>
> There are indexes on the document_chunks fields seq and documents_id.
>
> Everything looked fine until I examined the output of EXPLAIN ANALYZE
> and saw this:
>
I'm not surprised. First guess: I'd use
id (FK of documents), seq, ctime

as the prime key of document_chunk, which would work for your initial
requirement but is far too simplistic to deal with the general
requirement of retrieving a specific document version. You'd probably
need something like:

CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
...
);

Create table version (
version_no serial primary key,
author var char 40,
ctime timestamp );

create table document_version (
id serial references document(id),
version_number serial references version(version_no),
primary_key (id, version_no),
);

CREATE TABLE documents_chunks (
id SERIAL references document_version(id),
version_number serial references document_version(version_number),
seq serial,
content TEXT,
primary_key(id, version_number, seq)
};

Disclaimer: this is not syntax checked or tested. It may/or may not
match your requirements, but since I haven't seen your ERD or the 3NF
you derived from it I can't offer any more applicable advice.

Martin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martin Gregorie 2012-04-01 23:52:35 Re: Versioned, chunked documents
Previous Message Ivan Voras 2012-04-01 23:05:58 Re: Versioned, chunked documents