Re: idea for concurrent seqscans

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: idea for concurrent seqscans
Date: 2005-04-25 01:45:17
Message-ID: 200504250145.j3P1jHS19168@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


TODO description added:

* Allow sequential scans to take advantage of other concurrent
sequentiqal scans, also called "Synchronised Scanning"

One possible implementation is to start sequential scans from the lowest
numbered buffer in the shared cache, and when reaching the end wrap
around to the beginning, rather than always starting sequential scans
at the start of the table.

---------------------------------------------------------------------------

Jeff Davis wrote:
> I had an idea that might improve parallel seqscans on the same relation.
>
> If you have lots of concurrent seqscans going on a large relation, the
> cache hit ratio is very low. But, if the seqscans are concurrent on the
> same relation, there may be something to gain by starting a seqscan near
> the page being accessed by an already-in-progress seqscan, and wrapping
> back around to that start location. That would make some use of the
> shared buffers, which would otherwise just be cache pollution.
>
> I made a proof-of-concept implementation, which is entirely in heapam.c,
> except for one addition to the HeapScanDesc struct in relscan.h. It is
> not at all up to production quality; there are things I know that need
> to be addressed. Basically, I just modified heapam.c to be able to start
> at any page in the relation. Then, every time it reads a new page, I
> have it mark the relation's oid and the page number in a shared mem
> segment. Everytime a new scan is started, it reads the shared mem
> segment, and if the relation's oid matches, it starts the scan at the
> page number it found in the shared memory. Otherwise, it starts the scan
> at 0.
>
> There are a couple obvious issues, one is that my whole implementation
> doesn't account for reverse scans at all (since initscan doesn't know
> what direction the scan will move in), but that shouldn't be a major
> problem since at worst it will be the current behavior (aside: can
> someone tell me how to force reverse scans so I can test that better?).
> Another is that there's a race condition with the shared mem, and that's
> out of pure laziness on my part.
>
> This method is really only effective at all if there is a significant
> amount of disk i/o. If it's pulling the data from O/S buffers the
> various scans will diverge too much and not be using eachother's shared
> buffers.
>
> I tested with shared_buffers=500 and all stats on. I used 60 threads
> performing 30 seqscans each in my script ssf.rb (I refer to my
> modification as "sequential scan follower" or ssf).
>
> Here are some results with my modifications:
> $ time ./ssf.rb # my script
>
> real 4m22.476s
> user 0m0.389s
> sys 0m0.186s
>
> test=# select relpages from pg_class where relname='test_ssf';
> relpages
> ----------
> 1667
> (1 row)
>
> test=# select count(*) from test_ssf;
> count
> --------
> 200000
> (1 row)
>
> test=# select pg_stat_get_blocks_hit(17232) as hit,
> pg_stat_get_blocks_fetched(17232) as total;
> hit | total
> --------+---------
> 971503 | 3353963
> (1 row)
>
> Or, approx. 29% cache hit.
>
> Here are the results without my modifications:
>
> test=# select relpages from pg_class where relname='test_ssf';
> relpages
> ----------
> 1667
> (1 row)
>
> test=# select count(*) from test_ssf;
> count
> --------
> 200000
> (1 row)
>
> test=# select pg_stat_get_blocks_hit(17231) as hit,
> pg_stat_get_blocks_fetched(17231) as total;
> hit | total
> --------+---------
> 199999 | 3353963
> (1 row)
>
> Or, approx. 6% cache hit. Note: the oid is different, because I have two
> seperately initdb'd data directories, one for the modified version, one
> for the unmodified 8.0.0.
>
> This is the first time I've really modified the PG source code to do
> anything that looked promising, so this is more of a question than
> anything else. Is it promising? Is this a potentially good approach? I'm
> happy to post more test data and more documentation, and I'd also be
> happy to bring the code to production quality. However, before I spend
> too much more time on that, I'd like to get a general response from a
> 3rd party to let me know if I'm off base.
>
> Regards,
> Jeff Davis
>

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-04-25 02:29:27 Re: Old-style OR indexscan slated for destruction
Previous Message Bruce Momjian 2005-04-25 01:18:13 Re: Constant WAL replay