Quick Links

Bug #769: Slow vacuuming due to error in optimization

From:	pgsql-bugs(at)postgresql(dot)org
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	Bug #769: Slow vacuuming due to error in optimization
Date:	2002-09-16 18:01:12
Message-ID:	20020916180112.9CA22476567@postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Steve Marshall (smarshall(at)wsi(dot)com) reports a bug with a severity of 3
The lower the number the more severe it is.

Short Description
Slow vacuuming due to error in optimization

Long Description
I have found very long vacuuming times when vacuuming large tables in Postgres 7.2.1, running on i686 hardware under Redhat Linux 7.3.

Long vacuum times should have little effect on applications, as the new VACUUM implementation does not exclusively lock the table for any great length of time. However, I found that near the end of the vacuum operation, the vacuuming postgres backend began using 100% of one of the system CPUs, and all insert operations on the table being vacuumed stopped. This problem occurred after all the CPU times were reported for the VACUUM, but before the ANALYZE step began.

To identify the source of the problem, I inserted some additional log statements into the source file backend/commands/vacuumlazy.c, and was able to track down the problem to the function that updates the free space map (i.e. the function lazy_update_fsm). This in turn calls the function MultiRecordFreeSpace() in storage/freespace/freespace.c.

Looking at the code in MultiRecordFreeSpace(), I found that this function imposes an exclusive lock. This locking explains why my insert operations stopped. Looking further, I found a place where the comment and conditional logic did not seem to say the same thing, and hence looked suspicious. Here is the code snippet:

------
/*
* Add new entries, if appropriate.
*
* XXX we could probably be smarter about this than doing it
* completely separately for each one. FIXME later.
*
* One thing we can do is short-circuit the process entirely if a
* page (a) has too little free space to be recorded, and (b) is
* within the minPage..maxPage range --- then we deleted any old
* entry above, and we aren't going to make a new one. This is
* particularly useful since in most cases, all the passed pages
* will in fact be in the minPage..maxPage range.
*/
for (i = 0; i < nPages; i++)
{
BlockNumber page = pages[i];
Size avail = spaceAvail[i];
if (avail >= fsmrel->threshold ||
page < minPage || page > maxPage)
fsm_record_free_space(fsmrel, page, avail);
}

-------
The comment indicates that free space is recorded for a page if the available space is above the threshold AND the page is not within the min-max range that was handled in logic before this snippet. However, the code records free space if EITHER of these criteria are true.

Therefore I tried changing the logic to an AND, e.g.:

if (avail >= fsmrel->threshold &&
(page < minPage || page > maxPage))
fsm_record_free_space(fsmrel, page, avail);

This reduced my processing time in lazy_update_fsm() from about 2 minutes to nearly nothing, effectively solving my performance problem.
----
I'm a newbie to the Postgres source code, so I don't know if this is the proper place to submit this information, If I've submitted incorrectly, please let me know, so I can do it right next time.

I'd also be interested in knowing if this change has some hidden or long term effect I just don't see.

Sample Code

No file was uploaded with this report

Responses

Re: Bug #769: Slow vacuuming due to error in optimization at 2002-09-16 18:29:31 from Tom Lane
some other backend died abnormally at 2002-09-22 16:26:03 from fredrik chabot

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2002-09-16 18:29:31	Re: Bug #769: Slow vacuuming due to error in optimization
Previous Message	Tom Lane	2002-09-16 17:48:50	Re: make unsuccessful on mac os x 10.2