Re: ice-broker scan thread

From: "Pollard, Mike" <mpollard(at)cincom(dot)com>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ice-broker scan thread
Date: 2005-11-29 14:45:30
Message-ID: 6418CC03D0FB1943A464E1FEFB3ED46B01B220E7@im01.cincom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First, we need a new term for a thread of execution, that could be a
thread or could be a process, I don't care. When discussing anything
that is to run in parallel, the first thing that pops out of someones
mouth is "Don't you mean (thread/process)?" But that's an
implementation detail and should not be considered during a planning
phase, unless it is fundamental to the problem. Hence, the term TOE to
mean "I don't really care if it is in it's own address space, or the
same address space.". However, I understand that this is not in common
usage, so in the following discussion I use the term thread, as it is
more correct than process. I am just not defining if that thread is the
only thread running in its process or not.

I've implemented this on another database product, using buf reading
threads to pull the data all the way into the database cache. In
testing on Unix production systems (4 CPU machines, large RAID devices,
100Gb+ databases), table scans performed 5 to 7 times faster; on MVS
table scans are up to 10 times faster. But, I never had much luck on
getting the performance to change on Windows. Partially, I think, it's
because the machine I was using was IDE, not SCSI, so I was already
greatly bottlenecked. Maybe SATA would be better? I haven't tested
there, either.

Anyway, what I did was the following. When doing a sequential scan, we
were starting at the beginning of the table and scanning forward. If I
threw up some threads to read ahead, then my user thread and my read
ahead threads would thrash on trying to lock the buffer slots. So, I
had the read ahead threads start at some distance into the table, and
work toward the beginning. The user thread would do its own I/O until
it passed the read ahead threads. I also broke the read ahead section
into multiple contiguous sections, and had different threads read each
section, so the user thread would only have a problem with the first
section; by the time it was finished with that, the other sections would
be read in. When the user thread got to about 80% of the nodes that got
read ahead, it would schedule another section to be read.

+----------------------------------------------------------------+
| table +
+----------------------------------------------------------------+
(user->) (<-readahead) (<-readahead) (<-readaehead)

so above, the user threads is starting low in the table and working
high; the readahead threads are starting higher (but not at the end of
the table), and working low.

Like I said, this worked very well for me.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.
--------------------------------
Better to remain silent and be thought a fool than to speak out and
remove all doubt.
Abraham Lincoln

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Qingqing Zhou
Sent: Tuesday, November 29, 2005 12:56 AM
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] ice-broker scan thread

"David Boreham" <david_list(at)boreham(dot)org> wrote
>>
> I don't think your NT overlapped I/O code is quite right. At least
> I think it will issue reads at a high rate without waiting for any of
them
> to complete. Beyond some point that has to give the kernel gut-rot.
>

[also with reply to Gavin] look up dictionary for "gut-rot", got it ...
Uh,
this behavior is intended - I try to push enough requests shortly to
kernel
so that it understands that I am doing sequential scan, so it would pull
the
data from disk to file system cache more efficiently. Some file systems
may
have "free-behind" mechanism, but our main thread (who really process
the
query) should be fast enough before the data vanished.

>
> You could re-write your program to have a single thread but use aio.
> In that case it should show the same read ahead benefit that you see
> with the thread.
>

I guess this is also Gavin's point - I understand that will be two
different
methodologies to handle "read-ahead". If no other thread/process
involved,
then the main thread will be responsible to grab a free buffer page from

bufferpool and ask the kernel to put the data there by sync IO (current
PostgreSQL does) or async IOs. And that's what I want to avoid. I'd like
to
use a dedicated thread/process to "break the ice" only, i.e., pull data
from
disk to file system cache, so that the main thread will only issue
*logical*
read.

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mario Weilguni 2005-11-29 14:56:30 Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)
Previous Message Zeugswetter Andreas DCP SD 2005-11-29 14:27:59 Re: gprof SELECT COUNT(*) results