Why frequently updated tables are an issue

From: pgsql(at)mohawksoft(dot)com
To: "Christopher Browne" <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Why frequently updated tables are an issue
Date: 2004-06-10 14:30:09
Message-ID: 15910.64.119.142.34.1086877809.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK, the problem I am having with whole discussion, on several fronts, is
the idea of performance. If performance and consistent behavior were not
*important* issues to a project, a summary table would work fine, and I
could just vacuum frequently.

Currently a client needs to vacuum two summary tables at least once every
two seconds. The performace of the system slowly declines with each
summary update, until the next vacuum. After a vacuum, the transaction
comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and
we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes
get up to whole seconds.

There is an important issue here. Yes, MVCC is good. I agree, and no one
is arguing against it in a general case, however, there are classes of
problems in which MVCC, or at least PostgreSQL's implementation of it, is
not the best solution.

There are two basic problems which are fundimental issues I've had with
PostgreSQL over the years: summary tables and session tables.

The summary tables take the place of a "select sum(col) from table" where
table is very small. The amount of vacuuming required and the steady
degradation of performance prior to each vacuum is a problem that could be
addressed by some global variable system.

The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.

Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.

Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.

PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Robinson 2004-06-10 14:32:34 Re: Why frequently updated tables are an issue
Previous Message Tom Lane 2004-06-10 14:26:13 Re: I/O support for composite types