Re: Storing files: 2.3TBytes, 17M file count

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Thomas Güttler <guettliml(at)thomas-guettler(dot)de>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing files: 2.3TBytes, 17M file count
Date: 2016-11-29 15:27:24
Message-ID: f88d06c1-9288-8e89-705a-901a0dc7f6d2@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 11/29/2016 01:50 AM, Thomas Güttler wrote:
>
>
> Am 29.11.2016 um 01:52 schrieb Mike Sofen:
>> From: Thomas Güttler Sent: Monday, November 28, 2016 6:28 AM
>>
>> ...I have 2.3TBytes of files. File count is 17M
>>
>> Since we already store our structured data in postgres, I think about
>> storing the files in PostgreSQL, too.
>>
>> Is it feasible to store file in PostgreSQL?
>>
>> -------
>>
>> I am doing something similar, but in reverse. The legacy mysql
>> databases I’m converting into a modern Postgres data
>> model, have very large genomic strings stored in 3 separate columns.
>> Out of the 25 TB of legacy data storage (in 800
>> dbs across 4 servers, about 22b rows), those 3 columns consume 90% of
>> the total space, and they are just used for
>> reference, never used in searches or calculations. They range from 1k
>> to several MB.
>>
>>
>>
>> Since I am collapsing all 800 dbs into a single PG db, being very
>> smart about storage was critical. Since we’re also
>> migrating everything to AWS, we’re placing those 3 strings (per row)
>> into a single json document and storing the
>> document in S3 bins, with the pointer to the file being the globally
>> unique PK for the row…super simple. The app tier
>> knows to fetch the data from the db and large string json from the S3
>> bins. The retrieval time is surprisingly fast,
>> this is all real time web app stuff.
>>
>>
>>
>> This is a model that could work for anyone dealing with large objects
>> (text or binary). The nice part is, the original
>> 25TB of data storage drops to 5TB – a much more manageable number,
>> allowing for significant growth, which is on the horizon.
>
> Thank you Mike for your feedback.
>
> Yes, I think I will drop my idea. Encoding binary (the file content) to
> text and decoding to binary again makes no sense. I was not aware that
> this is needed.
>
> I guess I will use some key-to-blob store like s3. AFAIK there are open
> source s3 implementations available.

Just be aware that doing deltas over file changes, like rsync, while
possible is more convoluted and time/resource consuming with something
like s3.

>
> Thank you all for your feeback!
>
> Regards, Thomas
>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-11-29 15:30:27 Re: Storing files: 2.3TBytes, 17M file count
Previous Message Albe Laurenz 2016-11-29 13:58:27 Re: Rounding Problems?