Re: Millions of tables

From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
Cc: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Millions of tables
Date: 2016-09-26 12:53:12
Message-ID: CAEtnbpVddc_3o+vTrMhGEPzeoV4H1ngYEXXU9sUaFD0VgCM=Jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Following list etiquette response inline ;)

On Mon, Sep 26, 2016 at 2:28 AM, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
wrote:

>
>
> On 26/09/16 05:50, Greg Spiegelberg wrote:
>
>> Hey all,
>>
>> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a
>> time has said not to have millions of tables. I too have long believed it
>> until recently.
>>
>> AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1)
>> for PGDATA. Over the weekend, I created 8M tables with 16M indexes on
>> those tables. Table creation initially took 0.018031 secs, average
>> 0.027467 and after tossing out outliers (qty 5) the maximum creation time
>> found was 0.66139 seconds. Total time 30 hours, 31 minutes and 8.435049
>> seconds. Tables were created by a single process. Do note that table
>> creation is done via plpgsql function as there are other housekeeping tasks
>> necessary though minimal.
>>
>> No system tuning but here is a list of PostgreSQL knobs and switches:
>> shared_buffers = 2GB
>> work_mem = 48 MB
>> max_stack_depth = 4 MB
>> synchronous_commit = off
>> effective_cache_size = 200 GB
>> pg_xlog is on it's own file system
>>
>> There are some still obvious problems. General DBA functions such as
>> VACUUM and ANALYZE should not be done. Each will run forever and cause
>> much grief. Backups are problematic in the traditional pg_dump and PITR
>> space. Large JOIN's by VIEW, SELECT or via table inheritance (I am abusing
>> it in my test case) are no-no's. A system or database crash could take
>> potentially hours to days to recover. There are likely other issues ahead.
>>
>> You may wonder, "why is Greg attempting such a thing?" I looked at
>> DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face it,
>> it's antiquated and don't get me started on "Hadoop". I looked at many
>> others and ultimately the recommended use of each vendor was to have one
>> table for all data. That overcomes the millions of tables problem, right?
>>
>> Problem with the "one big table" solution is I anticipate 1,200 trillion
>> records. Random access is expected and the customer expects <30ms reads
>> for a single record fetch.
>>
>> No data is loaded... yet Table and index creation only. I am interested
>> in the opinions of all including tests I may perform. If you had this
>> setup, what would you capture / analyze? I have a job running preparing
>> data. I did this on a much smaller scale (50k tables) and data load via
>> function allowed close to 6,000 records/second. The schema has been
>> simplified since and last test reach just over 20,000 records/second with
>> 300k tables.
>>
>> I'm not looking for alternatives yet but input to my test. Takers?
>>
>> I can't promise immediate feedback but will do my best to respond with
>> results.
>>
>> TIA,
>> -Greg
>>
>
> Hi Greg.
>
> This is a problem (creating a large number of tables; really large
> indeed) that we researched in my company a while ago. You might want to
> read about it: https://www.pgcon.org/2013/schedule/events/595.en.html
>
>
updatedb, funny. Thank you for the pointer. I had no intention of going
to 1B tables.

I may need to understand autovacuum better. My impression was it consulted
statistics and performed vacuums one table at a time based on the vacuum
threshold formula on
https://www.postgresql.org/docs/9.5/static/routine-vacuuming.html.

-Greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Yves Dorfsman 2016-09-26 12:54:31 Re: Millions of tables
Previous Message Rick Otten 2016-09-26 10:23:58 Re: Millions of tables