Re: [HACKERS] RE: [GENERAL] Transaction logging

From: jwieck(at)debis(dot)com (Jan Wieck)
To: michael(dot)davis(at)prevuenet(dot)com (Michael Davis)
Cc: pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] RE: [GENERAL] Transaction logging
Date: 1999-03-05 18:23:51
Message-ID: m10IzGK-000EBQC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Michael Davis wrote:

>
> What would it take to have transaction logging added to Postgres. I am a
> c/c++ programmer and will consider contributing to the Postgres development
> effort. I really like everything I see and read about Postgres. As a

I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.

1. Log queries sent to the backend.

2. Log images of inserted/updated rows and row ID's of
deleted ones.

The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).

As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.

The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.

Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.

Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.

All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.

And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is a

SELECT 'now'::datetime - updtime FROM ...

Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.

Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.

Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Davis 1999-03-05 18:44:10 RE: [HACKERS] RE: [GENERAL] Transaction logging
Previous Message Oleg Broytmann 1999-03-05 15:42:12 Greek locale

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Davis 1999-03-05 18:44:10 RE: [HACKERS] RE: [GENERAL] Transaction logging
Previous Message Jackson, DeJuan 1999-03-05 18:20:04 RE: [HACKERS] PL/PGSQL