From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | Simon Riggs <simon(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sequential scans |
Date: | 2007-05-02 22:37:01 |
Message-ID: | 1178145421.28383.189.camel@dogma.v10.wvs |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > What should be the maximum size of this hash table?
>
> Good question. And also, how do you remove entries from it?
>
> I guess the size should somehow be related to number of backends. Each
> backend will realistically be doing just 1 or max 2 seq scan at a time.
> It also depends on the number of large tables in the databases, but we
> don't have that information easily available. How about using just
> NBackends? That should be plenty, but wasting a few hundred bytes of
> memory won't hurt anyone.
One entry per relation, not per backend, is my current design.
> I think you're going to need an LRU list and counter of used entries in
> addition to the hash table, and when all entries are in use, remove the
> least recently used one.
>
> The thing to keep an eye on is that it doesn't add too much overhead or
> lock contention in the typical case when there's no concurrent scans.
>
> For the locking, use a LWLock.
>
Ok. What would be the potential lock contention in the case of no
concurrent scans?
Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.
> No, not the segment. RelFileNode consists of tablespace oid, database
> oid and relation oid. You can find it in scan->rs_rd->rd_node. The
> segmentation works at a lower level.
Ok, will do.
> Hmm. Should we care then? CFG is the default on Linux, and an average
> sysadmin is unlikely to change it.
>
Keep in mind that concurrent sequential scans with CFQ are *already*
very poor. I think that alone is an interesting fact that's somewhat
independent of Sync Scans.
> - when ReadBuffer is called, let the caller know if the read did
> physical I/O.
> - when the previous ReadBuffer didn't result in physical I/O, assume
> that we're not the pack leader. If the next buffer isn't already in
> cache, wait a few milliseconds before initiating the read, giving the
> pack leader a chance to do it instead.
>
> Needs testing, of course..
>
An interesting idea. I like that the most out of the ideas of
maintaining a "pack leader". That's very similar to what the Linux
anticipatory scheduler does for us.
> >> 4. It fails regression tests. You get an assertion failure on the portal
> >> test. I believe that changing the direction of a scan isn't handled
> >> properly; it's probably pretty easy to fix.
> >>
> >
> > I will examine the code more carefully. As a first guess, is it possible
> > that test is failing because of the non-deterministic order in which
> > tuples are returned?
>
> No, it's an assertion failure, not just different output than expected.
> But it's probably quite simple to fix..
>
Ok, I'll find and correct it then.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2007-05-02 22:59:51 | Re: Sequential scans |
Previous Message | Tom Lane | 2007-05-02 22:25:03 | Re: reindexdb hangs |