Re: adding support for posix_fadvise()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: adding support for posix_fadvise()
Date: 2003-11-03 14:38:23
Message-ID: 13972.1067870303@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> So what API is desirable for uses 2-4? I'm thinking of adding a new
> function to the smgr API, smgradvise().

It's a little premature to be inventing APIs when you have no evidence
that this will make any useful performance difference. I'd recommend a
quick hack to get proof of concept before you bother with nice APIs.

> Given a Relation and an advice, this would:
> (a) propagate the advice for this relation to all the open FDs for the
> relation

"All"? You cannot affect the FDs being used by other backends. It's
fairly unclear to me what the posix_fadvise function is really going
to do for files that are being accessed by multiple processes. For
instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL
file, given that every other backend is going to have that same file
open? I would expect that rational kernel behavior would be to ignore
this advice unless it's set by the last backend to have the file open
--- but I'm not sure we can synchronize the closing of old WAL segments
well enough to know which backend is the last to close the file.

A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress. Think about a
complex query that is doing both a seqscan and an indexscan on the same
relation (a self-join could easily do this). You'd really need to
change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to
get set usefully.

In short I think you need to do some more thinking about what the scope
of the advice flags is going to be ...

> (b) store the new advice somewhere so that new FDs for the relation can
> have this advice set for them: clients should just be able to call
> smgradvise() without needing to worry if someone else has already called
> smgropen() for the relation in the past. One problem is how to store
> this: I don't think it can be a field of RelationData, since that is
> transient. Any suggestions?

Something Vadim had wanted to do for years is to decouple the smgr and
lower levels from the existing Relation cache, and have a low-level
notion of "open relation" that only requires having the "RelFileNode"
value to open it. This would allow eliminating the concept of blind
write, which would be a Very Good Thing. It would make sense to
associate the advice setting with such low-level relations. One
possible way to handle the multiple-scan issue is to make the desired
advice part of the low-level open() call, so that you actually have
different low-level relations for seq and random access to a relation.
Not sure if this works cleanly when you take into account issues like
smgrunlink, but it's something to think about.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Sullivan 2003-11-03 14:48:45 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Jan Wieck 2003-11-03 14:35:57 Re: Experimental patch for inter-page delay in VACUUM