Re: Storing files: 2.3TBytes, 17M file count

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: Thomas Güttler <guettliml(at)thomas-guettler(dot)de>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing files: 2.3TBytes, 17M file count
Date: 2016-11-29 12:06:51
Message-ID: CADmi=6NU4Q98_CxmgcmgGpJ9909KC+jurNJbCrKXZu8keGJE5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29 November 2016 at 16:50, Thomas Güttler <guettliml(at)thomas-guettler(dot)de>
wrote:

>
>
> Am 29.11.2016 um 01:52 schrieb Mike Sofen:
>
>> From: Thomas Güttler Sent: Monday, November 28, 2016 6:28 AM
>>
>> ...I have 2.3TBytes of files. File count is 17M
>>
>> Since we already store our structured data in postgres, I think about
>> storing the files in PostgreSQL, too.
>>
>> Is it feasible to store file in PostgreSQL?
>>
>>

> I guess I will use some key-to-blob store like s3. AFAIK there are open
> source s3 implementations available.
>
> Thank you all for your feeback!
>
> Regards, Thomas
>

I have a similar setup. I have about 20TB of data in over 60 million files.
It might be possible to store that in PG, but I think it would be a huge
headache easily avoided. Files are GPG encrypted and backed up offsite to
S3, with lifecycle rules to migrate that to Glacier storage. A tool like
boto lets you sync things easily to S3, and maybe directly to glacier, and
there are alternatives out there.

If your rsync is taking too long, it will be worse syncing to s3 though. If
that is your bottleneck, then you need to fix it. Probably by knowing which
files have changed and only resyncing them,for example using timestamps
from the database or storing 'incoming' files in a separate area from your
'archive'. Once you have this sorted you can do your backups every few
minutes and reduce your potential data loss.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message elbriga 2016-11-29 12:22:40 Rounding Problems?
Previous Message Michael Paquier 2016-11-29 12:00:03 Re: pg_rewind rewinded too much...