Re: old synchronized scan patch

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Eng" <eng(at)intranet(dot)greenplum(dot)com>
Subject: Re: old synchronized scan patch
Date: 2006-12-05 16:23:29
Message-ID: 45759D01.7050008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Florian G. Pflug wrote:
> Tom Lane wrote:
>> There are other issues for the "no lock" approach that Jeff proposes.
>> Suppose that we have three or four processes that are actually doing
>> synchronized scans of the same table. They will have current block
>> numbers that are similar but probably not identical. They will all be
>> scribbling on the same hashtable location. So if another process comes
>> along to join the "pack", it might get the highest active block number,
>> or the lowest, or something in between. Even discounting the possibility
>> that it gets random bits because it managed to read the value
>> non-atomically, how well is that really going to work?

It might in fact work quite well. Assuming that all the active blocks
are in memory, the process that joins the pack will quickly catch up
with the rest. I'd like to see some testing to be sure, though...

>> Another thing that we have to consider is that the actual block read
>> requests will likely be distributed among the "pack leaders", rather
>> than all being issued by one process. AFAIK this will destroy the
>> kernel's ability to do read-ahead, because it will fail to recognize
>> that sequential reads are being issued --- any single process is *not*
>> reading sequentially, and I think that read-ahead scheduling is
>> generally driven off single-process behavior rather than the emergent
>> behavior of the whole system. (Feel free to contradict me if you've
>> actually read any kernel code that does this.) It might still be better
>> than unsynchronized reads, but it'd be leaving a lot on the table.
> I don't see why a single process wouldn't be reading sequentially - As far
> as I understood the original proposal, the current blocknumber from the
> hashtable is only used as a starting point for sequential scans. After
> that,
> each backend reads sequentiall until the end of the table I believe, no?

When the read is satisfies from shared mem cache, it won't make it to
the kernel. From the kernel point of view, the pattern looks something
like this:

A 1--4--7-9
B -2---6---
C --3-5--8-

where letters denote processes, and numbers are block numbers read, and
time goes from left to right. When you look at one process at a time,
the pattern looks random, though it's constantly moving forward. It's
only when you look at all the processes together that you see that it's
sequential.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-12-05 16:25:20 Re: FAQ refresh
Previous Message Florian G. Pflug 2006-12-05 16:09:09 Re: old synchronized scan patch