Re: performance problem - 10.000 databases

From: Mike Rylander <miker(at)n2bb(dot)com>
To: Marek Florianczyk <franki(at)tpi(dot)pl>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: performance problem - 10.000 databases
Date: 2003-10-31 19:03:32
Message-ID: 200310311403.32402.miker@n2bb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Friday 31 October 2003 11:19 am, Marek Florianczyk wrote:
> W liście z pią, 31-10-2003, godz. 16:51, Mike Rylander pisze:
> > On Friday 31 October 2003 09:59 am, Marek Florianczyk wrote:
> > > W liście z pią, 31-10-2003, godz. 15:23, Tom Lane pisze:
> > > > Marek Florianczyk <franki(at)tpi(dot)pl> writes:
> > > > > We are building hosting with apache + php ( our own mod_virtual
> > > > > module ) with about 10.000 wirtul domains + PostgreSQL.
> > > > > PostgreSQL is on a different machine ( 2 x intel xeon 2.4GHz 1GB
> > > > > RAM scsi raid 1+0 )
> > > > > I've made some test's - 3000 databases and 400 clients connected at
> > > > > same time.
> > > >
> > > > You are going to need much more serious iron than that if you want to
> > > > support 10000 active databases. The required working set per
> > > > database is a couple hundred K just for system catalogs (I don't have
> > > > an exact figure in my head, but it's surely of that order of
> > > > magnitude).
> > >
> > > it's about 3.6M
> > >
> > > > So the
> > > > system catalogs alone would require 2 gig of RAM to keep 'em swapped
> > > > in; never mind caching any user data.
> > > >
> > > > The recommended way to handle this is to use *one* database and
> > > > create 10000 users each with his own schema. That should scale a lot
> > > > better.
> > > >
> > > > Also, with a large max_connections setting, you have to beware that
> > > > your kernel settings are adequate --- particularly the open-files
> > > > table. It's pretty easy for Postgres to eat all your open files
> > > > slots. PG itself will usually survive this condition just fine, but
> > > > everything else you run on the machine will start falling over :-(.
> > > > For safety you should make sure that max_connections *
> > > > max_files_per_process is comfortably less than the size of the
> > > > kernel's open-files table.
> > >
> > > Yes, I have made some updates, number of process, semaphores, and file
> > > descriptor. I'm aware of this limitation. On this machine there will be
> > > only PostgreSQL, nothing else.
> > > This idea with one database and 10.000 schemas is very interesting, I
> > > never thought about that. I will make some tests on monday and send
> > > results to the list.
> >
> > Following this logic, if you are willing to place the authentication in
> > front of the database instead of inside it you can use a connection pool
> > and simply change the search_path each time a new user accesses the
> > database.
>
> Well it's not so simple, I think. If I've got apache+php+phpAccelerator
> with persistent connection on. Server holds some pool of connection, but
> when new request is made from phpscripts, apache looks at his connection
> pool for a connection with parameters: dbname,dbuser,dbserver. So for
> each of 10.000 virtual domain ( 10.000 databases ) I would have to hold
> such a connection.
> Second thing: How to change search_path dynamically ? I can set in
> postgresql.conf: "search_path '$user, public'" but it works when a new
> client (username,password) is connecting to server, it gets his own
> schema with proper privileges (USE,CREATE) and thats all. Right ?

search_path documentation is here:
http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=ddl-schemas.html

> Or maybe I don't uderstand something ? If I will do in pg_hba.conf only
> one record:
> host any any ip_addr netmask md5
> and only one database, I must make 10.000 schemas with proper accesslist
> (USE,CREATE only for one user, and schemaname is same as dbusername)
> This is what I want to test ;)
>
> Now user connect from phpscript with dbusername=unique_user
> dbpass=unique_pass dbname=shared_db
> Server holds persistent connection, but it's still one connection per
> user, so it would have to have 10.000 simultaneous connection.
>
> I can't see any benefits, with connection pool, or I did not understand
> what you wanted to tell me. How to place authentication in front of the
> database using, when clients are using phpscripts ?

I suppose I didn't really explain what I was thinking. The senario I was
thinking of would go something like this:

User logins (ssh, etc...) if available would be PAM based. The user/customer
creation process would create a new schema in the single database with the
username for web/shell/ftp/etc logins. Postgresql can also use PAM for
logins and this would allow logins to the database from outside your web app.
The web app would always connect to the database as a user with access to all
schemas, but would look at the session authentication information to change
the active search path to be [username],public. In the case of shell (psql)
logins, the default search path would be $user,public as the docs show. If
the schemas are created with an AUTORIZATION of the [username] then local
(psql) logins would only allow them to see thier schema. But, because the
web app is connecting as a user with privileges that allow it to see (the
tables in) all schemas, it can act as any user by changing its search_path on
a connection by connection basis.

>
> greetings
> Marek
>
> > > greeings
> > > Marek
> > >
> > >
> > >
> > > ---------------------------(end of
> > > broadcast)--------------------------- TIP 2: you can get off all lists
> > > at once with the unregister command (send "unregister
> > > YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Mike Rylander

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2003-10-31 19:05:57 Re: dumping tables from badly damaged db
Previous Message scott.marlowe 2003-10-31 18:48:12 Re: performance problem - 10.000 databases