Re: Large tables (was: RAID 0 not as fast as

From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: bjordan(at)lumeta(dot)com
Subject: Re: Large tables (was: RAID 0 not as fast as
Date: 2006-09-21 19:54:06
Message-ID: 4512EDDE.3050403@logix-tt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, Bucky,

Bucky Jordan wrote:

>> We can implement multiple scanners (already present in MPP), or we
> could
>> implement AIO and fire off a number of simultaneous I/O requests for
>> fulfillment.
>
> So this might be a dumb question, but the above statements apply to the
> cluster (e.g. postmaster) as a whole, not per postgres
> process/transaction correct? So each transaction is blocked waiting for
> the main postmaster to retrieve the data in the order it was requested
> (i.e. not multiple scanners/aio)?

No, that's a wrong assumption.

It applies per active backend. When connecting, the Postmaster forks a
new backend process. Each backend process has its own scanner and
executor. The main postmaster is only for coordination (forking, config
reload etc.), all the work is done in the forked per-connection backends.

Furthermore, the PostgreSQL MVCC system ensures that readers are neither
ever blocked nor blocking other backends. Writers can block each other
due to the ACID transaction semantics, however the MVCC limits that to a
minimum.

> In this case, the only way to take full advantage of larger hardware
> using normal postgres would be to run multiple instances? (Which might
> not be a bad idea since it would set your application up to be able to
> deal with databases distributed on multiple servers...)

Typical OLTP applications (Web UIs, Booking systems, etc.) have multiple
connections, and those run fully parallel.

So if your application is of this type, it will take full advantage of
larger hardware. In the list archive, you should find some links to
benchmarks that prove this statement, PostgreSQL scales linearly, up to
8 CPUs and 32 "hyperthreads" in this benchmarks.

Our discussion is about some different type of application, where you
have a single application issuing a single query at a time dealing with
a large amount (several gigs up to teras) of data.

Now, when such a query is generating sequential disk access, the I/O
scheduler of the underlying OS can easily recognize that pattern, and
prefetch the data, thus giving the full speed benefit of the underlying
RAID.

The discussed problem arises when such large queries generate random
(non-continous) disk access (e. G. index scans). Here, the underlying
RAID cannot effectively prefetch data as it does not know what the
application will need next. This effectively limits the speed to that of
a single disk, regardless of the details of the underlying RAID, as it
can only process a request at a time, and has to wait for the
application for the next one.

Now, Bizgres MPP goes the way of having multiple threads per backend,
each one processing a fraction of the data. So there are always several
outstanding read requests that can be scheduled to the disks.

My proposal was to use posix_fadvise() in the single-threaded scanner,
so it can tell the OS "I will need those blocks in the near future". So
the OS can pre-fetch those blocks into the cache, while PostgreSQL still
processes the previous block of data.

Another proposal would be to use so-called asynchroneous I/O. This is
definitely an interesting and promising idea, but needs much more
changes to the code, compared to posix_fadvise().

I hope that this lengthy mail is enlightening, if not, don't hesitate to
ask.

Thanks for your patience,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bucky Jordan 2006-09-21 21:16:35 Re: Large tables (was: RAID 0 not as fast as
Previous Message Mark Lewis 2006-09-21 19:41:54 Re: Large tables (was: RAID 0 not as fast as