Skip site navigation (1) Skip section navigation (2)

Re: adding support for posix_fadvise()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: adding support for posix_fadvise()
Date: 2003-11-03 14:38:23
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Neil Conway <neilc(at)samurai(dot)com> writes:
> So what API is desirable for uses 2-4? I'm thinking of adding a new
> function to the smgr API, smgradvise().

It's a little premature to be inventing APIs when you have no evidence
that this will make any useful performance difference.  I'd recommend a
quick hack to get proof of concept before you bother with nice APIs.

> Given a Relation and an advice, this would:
> (a) propagate the advice for this relation to all the open FDs for the
> relation

"All"?  You cannot affect the FDs being used by other backends.  It's
fairly unclear to me what the posix_fadvise function is really going
to do for files that are being accessed by multiple processes.  For
instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL
file, given that every other backend is going to have that same file
open?  I would expect that rational kernel behavior would be to ignore
this advice unless it's set by the last backend to have the file open
--- but I'm not sure we can synchronize the closing of old WAL segments
well enough to know which backend is the last to close the file.

A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress.  Think about a
complex query that is doing both a seqscan and an indexscan on the same
relation (a self-join could easily do this).  You'd really need to
change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to
get set usefully.

In short I think you need to do some more thinking about what the scope
of the advice flags is going to be ...

> (b) store the new advice somewhere so that new FDs for the relation can
> have this advice set for them: clients should just be able to call
> smgradvise() without needing to worry if someone else has already called
> smgropen() for the relation in the past. One problem is how to store
> this: I don't think it can be a field of RelationData, since that is
> transient. Any suggestions?

Something Vadim had wanted to do for years is to decouple the smgr and
lower levels from the existing Relation cache, and have a low-level
notion of "open relation" that only requires having the "RelFileNode"
value to open it.  This would allow eliminating the concept of blind
write, which would be a Very Good Thing.  It would make sense to
associate the advice setting with such low-level relations.  One
possible way to handle the multiple-scan issue is to make the desired
advice part of the low-level open() call, so that you actually have
different low-level relations for seq and random access to a relation.
Not sure if this works cleanly when you take into account issues like
smgrunlink, but it's something to think about.

			regards, tom lane

In response to


pgsql-hackers by date

Next:From: Andrew SullivanDate: 2003-11-03 14:48:45
Subject: Re: Experimental patch for inter-page delay in VACUUM
Previous:From: Jan WieckDate: 2003-11-03 14:35:57
Subject: Re: Experimental patch for inter-page delay in VACUUM

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group