Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files
Date: 2017-07-17 10:34:04
Message-ID: VisenaEmail.4e.a8c89cfec01d15d.15d5001bcc2@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi.
 
After struggeling with storing LOs in the database and making efficient
backups/restore-routines I've decided to move LOs out of the DB and use a
"filename"-column containing an UUID referencing the file one the filesystem.
Having this schema I need a robust book-keeping mechanism so I know when I can
delete files on the filesystem when no longer referenced from the database.
 
In an email-program I store the whole email-body as a binary in a file and
store all header-info in an email_message table (for this example's sake).
 
I have a book-keeping table :
 
create table origo_email_message_file_operation( filename VARCHAR not null,
operationVARCHAR not NULL, PRIMARY KEY (filename, operation) );
And a email_message table holding the messages, with a "filename"-column
referencing the BLOB.
 
For DELETE I use this routine:
 
deleteData(fileName, messageId) {
startTX()
deleteEmail() // Deletes the entry in email_message table
INSERT INTO origo_email_message_file_operation(filename, operation) VALUES(
'3b1d18ae-7b54-c055-1016-d928daec7294','DELETE'); deleteEmail(messageId)

commitTX()
}
 
So for DELETE-operations the file on disk isn't deleted by the main-program,
but by a cleanup-job which runs as a cron-job and the
inspects origo_email_message_file_operation for DELETE-entries and then delete
the referenced files, then removes the DELETE-entries from
origo_email_message_file_operation.
 
 
 
INSERT is like this:
 
insertData(fileName) {
 
startTX()
// First, register the INSERT in case it fails 
INSERT INTO origo_email_message_file_operation(filename, operation) 
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','INSERT');
 
commitTX()
 
startTX()
 
 
insertEmail() // Inserts the entry in email_message table
DELETE FROM origo_email_message_file_operation WHERE filename =
'3b1d18ae-7b54-c055-1016-d928daec7294'AND operation = 'INSERT';
// If this commits there is no entry left
in origo_email_message_file_operation and we're all good
commitTX()
 

 
}
 
UPDATE is implemented as INSERT + DELETE.
 
 
The challenge with this is what to do if INSERT rolls back. If INSERT rolls
back then we end up with an INSERT-entry in origo_email_message_file_operation
with no corresponding "filename"-entry in email_message. But I fail to se how a
cleanup job can know the difference between such an INSERT-entry
in origo_email_message_file_operation caused by ROLLBACK and and INSERT-entry
caused by an in-progress insertEmail() operation.
 
Does anyone have a robust mechanism for cleaning up files in such scenarios? 
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

Browse pgsql-general by date

  From Date Subject
Next Message Clodoaldo Neto 2017-07-17 13:34:31 Re: Failed DNF dependency in Fedora
Previous Message Michael Paquier 2017-07-16 15:47:20 Re: Monitoring of a hot standby with a largely idle master