Quick Links

Re: BUG #4801: Performance failure: 600 MB written to each WAL log

From:	Peter Much <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To:	Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #4801: Performance failure: 600 MB written to each WAL log
Date:	2009-05-08 16:04:15
Message-ID:	20090508160415.GA33295@gate.oper.dinoex.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Dear Rafael,

I agree with You. And that is actually what I have done up to now,
although for a different reason: the amount of WAL logs produced
by VACUUM FULL is practically unpredictable and can be much more than
the database size, and since that database belongs to the backup system,
the backup system cannot archive them during VACUUM FULL.
I have solved this now with ZFS, abandoning the need for
full_page_writes.

On Fri, May 08, 2009 at 02:59:04PM +0200, Rafael Martinez wrote:
! How large is that 'large table' you are talking about?

20 Mio. tuples. Which is quite a lot for a pentium-II mainly
used as a router with some attached tapedrives.
OTOH, the machine has not much else to do, so it can well run
VACUUM FULL once in a while.

! vacuum full on a large database has so long execution time and uses so
! many resources (IO/ram/time) that it is almost useless. If you have such
! a problem with your database, that you need the functionality delivered
! by vacuum full, it is better to export and import your database.

Yes. That respective postgres installation used to contain (and
still does) a couple of databases for my needs, some of them
experimental, none of them big. I considered it very practical
to run VACUUM FULL (and Index rebuild) thru all instances from
cron every week or month, so the databases would reshape by
themselves if some mess was left behind.

Then I aquired the backup software, it needed a database, so I
placed it into the existing postgres installation - and after
noticing that it tends to get some hundred times bigger than
the others together, I gave it some extra diskspace. And I
thought, postgresQL is extremely scaleable, it can do it.

So, I should exclude that backup-system DB from the VACUUM FULL
cronjob. Okay, well, if it works properly, it runs some hours and
does not hurt anybody - and, as it is done regularly at a fixed
time, I can look into my recorded "df" outputs later, and they
will show me the exact amount the DB is growing over time.

! This does not explain your problem, but maybe if you can avoid running
! vacuum full unnecessarily, you will also avoid this problem you are
! experiencing.

Yes. But this is a really strange behaviour, and it does not at
all match with the usual postgresQL style, which is very precise.
Or, in other words: it creeps like a bug, it smells like a bug,
it looks like a bug - maybe it is a bug. And I am not fond of letting
identified bugs creep under the carpet - they tend to come out at
unpleasant moments.
Or, said again differntly: that server does something which seems
not to make sense at all. Lets hunt it down, at least figure out
why it does this.

It might be possible to approach this by logical analysis, without
debugging gear, by asking: >under which circumstances are WAL logs
written redundantly?<
Or, if there is someone who would like to analyze the matter, I could
try to get the FreeBSD GEOM or something there to do a tracelog
of the stuff that is actually written to WAL.
Maybe then we find something that is not limited to VACUUM FULL, or
maybe we find something that does NOT do greater havoc only by good
luck. Or maybe it is just the way it should work...

best regards,
Peter

In response to

Re: BUG #4801: Performance failure: 600 MB written to each WAL log at 2009-05-08 12:59:04 from Rafael Martinez

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michal Szymanski	2009-05-08 16:18:23	Re: 42804: structure of query does not match error where using RETURN QUERY
Previous Message	Tom Lane	2009-05-08 15:44:12	Re: BUG #4800: constraint_exclusion could be smarter with bool conversion