Re: [HACKERS] Re: [PORTS] vacuum takes too long

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: dave(at)turbocat(dot)de, ports(at)postgreSQL(dot)org, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: [PORTS] vacuum takes too long
Date: 1999-01-07 04:38:18
Message-ID: Pine.BSF.4.05.9901070019420.417-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-ports

On Wed, 6 Jan 1999, Bruce Momjian wrote:

> > > Wish I knew the answer. I can guess, but that isn't going to help.
> >
> > Guess == throwing in ideas, even if they are incorrect...the way I
> > figure it, I through out alot of guesses...some of them spark ideas in
> > others and we see some really neat ideas come out of it :)
>
> Yes. Good. I have been bugging Vadim about possible row reuse, but I
> don't know enough to understand the options.

I'm not sure about the row-reuse thing. What sort of performance hit will
it have. As it is now, you add a row by zipping down to the end, add the
row...bang, finished. with row-reuse, you have to search for a good fit,
which could take time...

Hrmmm...let's look at Oracle's "model"...bear in mind that I haven't dived
very deep into it, so I could be totally off base here, but, with Oracle,
you have a seperate "group" of processes started up for each 'instance',
where, if I'm correct, an instance is the same as our database(?)...

How hard would it be for us to implement something similar? When you
start up the postmaster, it starts up 1 postgres "master process" for each
database that it knows about. The point of the master process is
effectively the garbage collector for the database, as well as the central
'traffic cop'...

so, for example, I have 4 databases on my server...when you start up the
system with your normal 'postmaster' process, it forks off 4 processes,
one for each database. When you connect to port #### for database XXXX,
the listening process (main postmaster) shunts the process over to the
appropriate 'traffic cop' for handling...

The 'traffic cop' would keep track of the number of connections to the
database are currently open, and when zero, which woudl indicate idle
time, process through a table in the database to clean it up. As soon as
a new connection comes in, it would "finish" its cleanup by making sure
the table is in a 'sane state' (ie. finish up with its current record) and
then fork off the process, to wait quietly until its idle again...

Then each database could effectively have their own shared memory pool
that could be adjusted on a per database basis. Maybe even add a 'change
threshold', where after X transactions (update, insert or delete), the
table gets auto-vacuum'd (no analyze, just vacuum)...the threshold could
be set on a per-table basis...the 'traffic cop' should be able to easily
keep track of those sort of stats internally...no?

Hell, the 'traffic cop' *should* be able to keep reasonably accurate stats
to update the same tables that a 'vacuum analyze' maintains, adjusting
those values periodically to give a semi-accurate picture. Periodically,
a normal 'analyze' would have to be run...

Its a thought...haven't got a clue as to the complexity of implementing,
but...*shrug*

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hal Snyder 1999-01-07 09:34:02 Upcoming Attractions, web site
Previous Message Bruce Momjian 1999-01-07 03:53:51 Re: [HACKERS] Re: [PORTS] vacuum takes too long

Browse pgsql-ports by date

  From Date Subject
Next Message Unprivileged user 1999-01-07 06:13:22 Port Bug Report: 'today'::datetime <> datetime('today'::date)
Previous Message Bruce Momjian 1999-01-07 03:53:51 Re: [HACKERS] Re: [PORTS] vacuum takes too long