Re: Thousands databases or schemas

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Denis <socsam(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Thousands databases or schemas
Date: 2012-11-09 06:38:22
Message-ID: CABUevEx_TPdrbJSU3Zt5yrMFRGJ=Dbsmxki7eZF7-mnN-HHDxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Nov 9, 2012 at 7:15 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 11/08/2012 09:29 PM, Denis wrote:
>> Ok guys, it was not my intention to hurt anyone's feelings by mentioning
>> MySQL. Sorry about that.
> It's pretty silly to be upset by someone mentioning another DB product.
> I wouldn't worry.
>> There simply was a project with a similar
>> architecture built using MySQL. When we started the current project, I have
>> made a decision to give PostgreSQL a try.
> It's certainly interesting that MySQL currently scales to much larger
> table counts better than PostgreSQL appears to.
>
> I'd like to see if this can be improved down the track. Various people
> are doing work on PostgreSQL scaling and performance, so with luck huge
> table counts will come into play there. If nothing else, supporting
> large table counts is important when dealing with very large amounts of
> data in partitioned tables.
>
> I think I saw mention of better performance with higher table counts in
> 9.3 in -hackers, too.
>
>> I would recommend you to refresh the info here
>> http://wiki.postgresql.org/wiki/FAQ. There is a question "What is the
>> maximum size for a row, a table, and a database?". Please add there info on
>> maximum DBs number and tables number one DB can contain while PostgreSQL
>> continues to work properly.
> Yeah, a number of people have been thrown by that. Technical limitations
> aren't the same as practical limitations, and in some cases the
> practical limitations are lower.

Yes. And the fact is that, PostgreSQL doesn't actually have a big
problem with this scenario. pg_dump does. It's mainly a tool issue,
not a database engine one. For example, pg_admin used to have problems
even with a much smaller number of databases than that - I think that
is better in current releases, but I'm not 100% sure. And you can
imagine what a tool looks like that tries to graph per database
values, for example, into a single graph when you have thousands of
databases.

PostgreSQL isn't perfect in these cases - as noted just creating a new
schema can take slightly more than the usual millisecond. But it
works, and I've never come across a scenario personally where it's not
"good enough" (that doesn't mean it doesn't exist, of course).

pg_dump and pg_dumpall, however, do have issues, as noted elsewhere as well.

But as Pavel mentioned, and others certainly have before as well,
there are other ways to deal with backups that solve this problem. It
may not be perfect, but it gets you pretty far.

It may not be a match for the requirements in this case - but it's not
a general limitation of the db.

FWIW, have you looked into using proper backups with PITR and just
rolling forward through the transaction log with
pause_at_recovery_target set? That's an excellent way to deal with the
"we lost data sometime between <x> and <y> but don't know when" when
tracking it down. I've yet to find a case where that's not easier than
repeatedly restoring even a fairly small pg_dump based backup to find
it. Plus, it give you a much better granularity, which leads to better
results for your customer. *And* it takes away the dependency on
pg_dump's performance issues.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Boreham 2012-11-09 14:34:13 Re: HT on or off for E5-26xx ?
Previous Message Craig Ringer 2012-11-09 06:15:45 Re: Thousands databases or schemas