Re: Databases vs Schemas

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Scott Otis <scott(dot)otis(at)intand(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Databases vs Schemas
Date: 2009-10-10 13:26:19
Message-ID: b42b73150910100626u695179edq98b2f21ea3b4c533@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
> On 10/9/09 2:02 PM, "Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote:
>
>> On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis <scott(dot)otis(at)intand(dot)com> wrote:
>>> Over the next couple of months we will be creating an instance of our
>>> solution for each public school district in the US which is around 18,000.
>>> That means currently we would be creating 18,000 databases (all on one server
>>> right now ­ which is running 8.4).  I am assuming this is probably not the
>>> best way of doing things.
>>
>> Schema advantages:
>> *) maintenance advantages; all functions/trigger functions can be
>> shared.  HUGE help if you use them
>> *) can query shared data between schemas without major headaches
>> *) a bit more efficiency especially if private data areas are small.
>> kinda analogous to processes vs threads
>> *) Can manage the complete system without changing database sessions.
>> This is the kicker IMO.
>>
>> Database Advantages:
>> *) More discrete.  Easier to distinctly create, dump, drop, or move to
>> separate server
>> *) Smaller system catalogs might give efficiency benefits
>>
>
> I'm concerned how a system with 57 * 18000 > 1M tables will function.
>
> I've got 200,000 tables in one db (8.4), and some tools barely work.  The
> system catalogs get inefficient when large and psql especially has trouble.
> Tab completion takes forever, even if I make a schema "s" with one table in
> it and type "s." and try and tab complete -- its as if its scanning all
> without a schema qualifier or using an index.  Sometimes it does not match
> valid tables at all, and sometimes regex matching fails too ('\dt
> schema.*_*_*' intermittently flakes out if it returns a lot of matches).
> Other than that the number of tables doesn't seem to cause much performance
> trouble.  The only exception is constraint exclusion which is fundamentally
> broken with too many tables on the performance and memory consumption side.
>
> Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
> maintenance tools could partially matched object names with regex though.
>
> On the other hand, lots of databases probably has performance drawbacks too.
> And its maintenance drawbacks are even bigger.
>
> I certainly don't see any reason at all to try and put all of these in one
> schema.  The only useful choices are schemas vs databases.  I'd go for
> schemas unless the performance issues there are a problem.   Schemas can be
> dumped/restored/backed up independent of one another easily too.

They can, but: drop schema foo cascade; is a different operation than:
drop database foo; The first is kinda surgical and the second is a
rocket launcher. What would you rather have in battle?

For the record, just about every database I've ever designed has had
some of what I call 'de facto table partitioning' using
schemas/search_path tricks. I'm working on a system right now that is
going to get very large and if I started to run into psql problems I'd
probably look at patching it, maybe \set an option to simplify some
of the queries.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Chris Kratz 2009-10-10 14:44:35 Re: Databases vs Schemas
Previous Message tsuraan 2009-10-10 05:14:27 Re: UUID as primary key