Storing large documents - one table or partition by doc?

From: Dev Nop <devnop0(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Storing large documents - one table or partition by doc?
Date: 2016-09-23 10:12:22
Message-ID: CACjtUOT+noY4RHt_qA6Jeb-2UD-2F-M=KDwPNS4KEnBL3BA87A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I’m storing thousands of independent documents each containing around 20k
rows. The larger the document, the more likely it is to be active with
inserts and updates (1000s/day). The most common read query is to get all
the rows for a single document (100s/day). It will be supporting real-time
collaboration but with strong-consistency for a simple schema so not
well-suited to dedicated "document databases" that assume schema-less &
eventual consistency. I won’t have great hardware/budget so need to squeeze
the most out of the least.

My question is whether to put all documents into a single huge table or
partition by document?

The documents are independent so its purely a performance question. Its too
many tables for postgresql partitioning support but I don’t get any benefit
from a master table and constraints. Handling partitioning in application
logic is effectively zero cost.

I know that 1000s of tables is regarded as an anti-pattern but I can only
see the performance and maintenance benefits of one table per independent
document e.g. fast per-table vacuum, incremental schema updates, easy
future sharding. A monster table will require additional key columns and
indexes that don’t have any value beyond allowing the documents to sit in
the same table.

The only downsides seem to be the system level per-table overhead but I
only see that as a problem if I have a very long tail of tiny documents.
I'd rather solve that problem if it occurs than manage an
all-eggs-in-one-basket monster table.

Is there anything significant I am missing in my reasoning? Is it mostly a
“relational purist” perspective that argues against multiple tables? Should
I be looking at alternative tech for this problem?

The one factor I haven't fully resolved is how much a caching layer in
front of the database changes things.

Thanks for your help.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Sofen 2016-09-23 12:14:12 Re: Storing large documents - one table or partition by doc?
Previous Message Sven R. Kunze 2016-09-23 06:35:09 Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause