Re: Concerns about this release

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: Jason Earl <jason(dot)earl(at)simplot(dot)com>, mlw <markw(at)mohawksoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concerns about this release
Date: 2001-12-19 08:23:25
Message-ID: 200112190823.KAA01715@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>Don Baccus said:
> > It also means that the PostgreSQL hackers can say that their database is
> > ready for 24x7 operation in "the default" mode.
>
>
> Plenty of people already run it 24x7.

I guess many do, actually. With the old (existing) code. I used to run most of
my production systems with beta code years ago - as at that time PostgreSQL
was beta quality anyway :) - but I currently run 7.0 on such systems, planning
an upgarde to 7.1 anytime now - actually, the delay is because I need to shut
down that large 24x7 setup!

> 2. outer joins didn't change the semantics of existing queries. You can
> write queries you couldn't write before, but your old queries work
> without change.

Not true!

SQL that worked in 7.0 does not work always in 7.1. Arguably, that can be
explained by 'bad SQL', but does not change the fact.

The default SQL inheritance also makes things much, much different. Let me
explain with this 'intuitive' schema:

CREATE TABLE maillog (
from text,
to text,
msgdate datetime,
msgid text,
bytes int
);

Now this is the 'current' log data, and we need to archive it periodically, to
keep the old data and at the same time avoid unnecessary performance decrease.
So we create archive tables like this

CREATE TABLE maillog200101 () INHERITS (maillog);
CREATE TABLE maillog200102 () INHERITS (maillog);

Data gets there by

INSERT INTO maillog200101 SELECT * FROM maillog WHERE msgdate >= '2001-01-01'
and msgdate < '2001-02-01';
DELETE FROM maillog WHERE msgdate >= '2001-01-01' and msgdate < '2001-02-01';

etc.

One of the reasons in doing this is that the archive creation software needs
not know the exact structure of the table it archives...

Under 7.0 everything is as expected. Under 7.1 when you SELECT from maillog,
you also select from all inherited tables, which is not expected. One can of
course rewrite all the SELECTs :-)

Sorry for the off-topic, but this hit me badly :)

> 4. WAL was a scary change due to the scope, but it didn't change the
> semantics of transactions.

But it was discussed many, many times before being implemented.

> 5. The semantics of VACUUM have changed. Silently (in the sense that
> there's no notification or warning spewed out).

You argue with Tom here, but both of you go call the same beast differently.
Sure, semantics of the 'VACUUM' command has changed. The old semantics is now
available as 'VACUUM FULL' - as it was already commented, this is no worry for
new users. The new semantics of VACUUM may actually be an improvement, because
many people need to run VACUUM in order to update statistics. The only benefit
I see from VACUUM FULL is disk space usage reduction - which most
administrators do not do as default action (that is, rarely in scripts).

If free heap is reused in both tables and indexes, I do not see much trouble
in the new semantics, especially for 24x7 setups! Tables, that are used for
'logs' always grow - if you take my example above, given you have ca 1 million
records a month, after the first month the maillog table is supposed to
contain 1 million records, then those get deleted and the new tuples will
occupy another 1 million records disk space... now given that we archive the
old data and not just throw it away, if free space reuse is available, the
maillog table will never grow more than 2 million records in size, which is
reasonable. With the old vacuum semantics, the vacuuming of such tables is
expensive (as they usually have indexes) and the saved disk space is not
significant.

> I think the new VACUUM is a great thing, and frankly I don't care all
> that much about the VACUUM FULL command issue. I tend to take the view
> that command semantics shouldn't be changed if one can avoid it. It's a
> view built on the fact that about twenty years of my past life were
> spent as a language implementor and dealing with customer feedback when
> I've been foolish enough to make semantics changes of this sort.

I fully agree with this.

However, what I think Tom had not said in his defense is that the new VACUUM
semantics is expected to greatly improve the experience with PostgreSQL. As
such, NEW users are more likely to appreciate it. More users = more found bugs
= better PostgreSQL (perhaps other benefits :) It may also be an improvement
for true 24x7 setups - users of my PostgreSQL powered company management
database start to complain when i do VACUUM during the day, as table locking
is 'freezing' for example cash register operation, and 'people in the queue
are waiting' etc ;)

> > Bring it on! I have good backups :).
>
>
> Uh ... if you're truly 24x7 critical then rolling back to your last
> backup wouldn't cut it, I'd think ...

Ah... backups.. and they also load back so slow :)

Daniel

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Kalchev 2001-12-19 08:36:41 Re: Thoughts on the location of configuration files
Previous Message Christopher Kings-Lynne 2001-12-19 08:09:18 Re: FreeBSD/alpha