On Fri, 2005-02-25 at 12:54 -0500, Tom Lane wrote:
> Jeff Davis <jdavis-pgsql(at)empires(dot)org> writes:
> > (1) Do we care about reverse scans being done with synchronized
> > scanning? If so, is there a good way to know in advance whether it is
> > going to be a forward or reverse scan (i.e. before heap_getnext())?
> There are no reverse heapscans --- the only case where you'll see
> direction = backwards is while backing up a cursor with FETCH BACKWARD.
> I don't think you need to optimize that case.
Ok, I was wondering about that.
> What I'm more concerned about is your use of shared memory. I didn't
> have time to look at the patch, but how are you determining an upper
> bound on the amount of memory you need? What sort of locking and
> contention issues will there be?
Right now a scanning backend puts the page it's scanning into shared
memory when it gets a new page (so it's not every tuple). I haven't
determined whether this will be a major point of locking contention.
However, one possible implementation seems to solve both problems at
Let's say we just had a static hash table of size
100*sizeof(oid)*sizeof(blocknumber) (to hold the relation's oid and the
page number it's currently scanning). The relid would predetermine the
placement in the table. If there's a collision, overwrite. I don't think
much is lost in that case, unless, for example, two tables in an
important join have oids that hash to the same value. In that case the
effectiveness of synchronized scanning will be lost, but not worse than
the current behavior.
Let's say we didn't use any locks at all. Are there any real dangers
there? If there's a race, and one backend gets some garbage data, it can
just say "this is out of bounds, start the scan at 0". Since it's a
static hash table, we don't have to worry about following a bad pointer,
etc. If that looks like it will be a problem, I can test with locking
also to see what kind of contention there is.
The current patch I sent was very much a proof of concept, but all it
did was have a shared mem segment of size 8 bytes (only holds info for
one relid at a time). That would probably be somewhat effective in many
cases, but of course we want it larger than that (800? 8KB?).
In short, I tried to overcome these problems with simplicity. Where
simplicity doesn't work I default to starting the scan at 0. Hopefully
those non-simple cases (like hash collisions and shared memory races)
are rare enough that we don't lose all that we gain.
> Another point is that this will render the results from heapscans
> unstable, since different executions of the same query might start
> at different points. This would for example probably break many
> of the regression tests. We can deal with that if we have to, but
> it raises the bar of how much benefit I'd want to see from the patch.
I didn't consider that. Is there a reason the regression tests assume
the results will be returned in a certain order (or a consistent order)?
> One detail that might or might not be significant: different scans are
> very likely to have slightly different ideas about where the end of the
> table is, since they determine this with an lseek(SEEK_END) at the
> instant they start the scan. I don't think this invalidates your idea
> but you need to watch out for corner-case bugs in the coding.
I only see that as an issue in initscan(), where it sets the start page.
A simple bounds check would cure that, no? If it was out of bounds, set
the start page to zero, and we didn't lose much. I need a bounds check
there anyway, since the data we get from shared memory needs to be
validated. That bounds check would be comparing against the current
backend's scan->rs_nblocks, which should be the correct number for that
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2005-02-25 18:28:06|
|Subject: Re: [JDBC] Where are we on stored procedures? |
|Previous:||From: Tom Lane||Date: 2005-02-25 18:25:35|
|Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 |