Re: Decent VACUUM (was: Buglist)

From: paul_tuckfield(at)yahoo(dot)com (Grant Succeeded)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Decent VACUUM (was: Buglist)
Date: 2003-08-27 00:02:56
Message-ID: 573fc2cd.0308261602.6a7a3da1@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) wrote in message news:<26905(dot)1061500610(at)sss(dot)pgh(dot)pa(dot)us>...
> Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> > better. AFAICS Vivek's problem is that it is hard enough to hold a
> > good part of the working set in the cache, and still his disks are
> > saturated. Now a VACUUM not only adds one more process to disk I/O
> > contention, but also makes sure that the working set pages are *not*
> > in memory which leads to higher I/O rates after the VACUUM.

This is a phenomenon I'm very familiar with with large databases (not
pg, but same techniques should apply). Generally, large OS
filesystem caches just get in the way when you're doing lots of IO on
a db thats big relative to the cache footprint.

perhaps restating:
The best for me by far, is to get the OS to *not* cache stuff. As
long as the database uses the information it inherently has available,
it can make far more effective use of the same amount of memory the OS
would have used to cache the whole filesystem. Furthermore, a cache
hit in the db is just a pointer lookup, and is far cheaper than a
read() system call resulting in a OS cache hit. Even if the OS does a
good job, every hit still costs a read system call.

>
> We have had some people looking at improved buffer management
> algorithms; LRU-2 or something smarter would help. I dunno whether
> we can dissuade the kernel from flushing its cache though.
>
once you get the relatively dumb, uninformed OS cache out of the way,
the DB has enough info to cache more effectively.

I'd love to see the new cache stuff, I'm still groking the 7.3 source
code.

my two favorite features of a certain other rdbms:
- it puts sequential IO pages near the end of the LRU, not the top.
Pretty
effective in avoiding cache churn. The essential heuristic is index
scans
go to the LRU end, full scan and (for pg) vaccuum go near the MRU
end.
It's an effective, and hopefully straighforward thing. Does pg do
this?
- For still tougher performanc/stability cases it allows one to
segregate
the cache into different LRU lists, each with a configurable slice
of
the overall buffer shared mem. This for is when the application
itself causes
the same cache churn phenomenon that vaccuum causes for the above
user.
One configures distinct LRU pools, then optionally associates key
objects
to non-default pools. So when a block from a given object gets
pulled in,
it goes to the LRU list the user chose for it (else default if user
didn't
choose to put it in a distinct LRU pool) THen when
some random query does a huge scan on a huge table, it wont flush
the default
cache pool if that table has been configured to another pool
In most applications it's pretty clear which big,historical tables
should be
segregated in this manner. but the default behavior (one big pool)
is the
same as pg works today.

Hopefully these above features are only incremental changes to the
existing pg LRU?

> > If we teach VACUUM to not read pages that don't contain any dead
> > tuples, this could be a significant improvement. I'm envisioning a
> > data structure (reclaimable space map, RSM) similar to the FSM.
> > Whenever a backend encounters a dead tuple it inserts a reference to
> > its page into the RSM.

>
> This assumes that backends will visit dead tuples with significant
> probability. I doubt that assumption is tenable; it's certainly not
> if you assume that no backend is doing seqscans. (And if they are,
> then VACUUM is not the only I/O culprit...)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Karp 2003-08-27 00:04:10 postgres vs mysql
Previous Message Tom Lane 2003-08-26 22:55:06 Re: 7.4b1 vs 7.3.4 performance

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2003-08-27 00:04:13 Re: 2-phase commit
Previous Message Marie G. Tuite 2003-08-26 22:42:33 before trigger problem