Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought
Date: 2012-02-27 10:05:16
Message-ID: CAKt_ZfsVQ1AL3Ry6FtQXexmAoQP4ZQcOWfqQJR-PKd1CGt0jhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Feb 26, 2012 at 12:11 PM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:

> Thanks to all who responded so far. I got some more insights from Mike
> Stonebraker himself in the USENIX talk Scott pointed to before.
> I'd like to revise the four points a little bit I enumerated in my
> initial question and to sort out what PG already does or could do:
>
> 1. Buffering Pool
>
> To get rid of I/O bounds Mike proposes in-memory database structures.
> He argues that it's impossible to be implemented by "old elephants"
> because it would be a huge code rewrite since there is also a need to
> store memory structures (instead disk oriented structures).
> Now I'm still wondering why PG could'nt realize that probably in
> combination with unlogged tables? I don't overview the respective code
> but I think it's worthwhile to discuss even if implementation of
> memory-oriented structures would be to difficult.
>

The reason is that the data structures assume disk-based data structures,
so they are written to be efficient to look up on disk but not as efficient
in memory.

Note that VoltDB is a niche product and Stonebreaker makes this pretty
clear. However, the more interesting question is what the tradeoffs are
when looking at VoltDB vs Postgres-XC.

>
> 2. Locking
>
> This critique obviously does'nt hold for PG since we have MVCC here
> already.
>
> 3. WAL logging
>
> Here Mike proposes replication over several nodes as an alternative to
> WAL which fits nicely with High Availability. PG 9 has built-in
> replication but just not for unlogged tables :-<
>

I find it interesting that two of the four areas he identifies have to do
with durability.....

>
> 4. Latches
>
> This is an issue I never heard before. I found some notion of latches
> in the code but I does'nt seem to be related to concurrently accessing
> btree structures as Mike suggests.
> So if anyone could confirm that this problem exists producing overhead
> I'd be interested to hear.
> Mike proposes single-threads running on many cores where each core
> processes a non overlapping shard.
> But he also calls for ideas to invent btrees which can be processed
> concurrently with as less memory locks as possible (instead of looking
> to make btrees faster).
>
> So to me the bottom line is, that PG already has reduced overhead at
> least for issue #2 and perhaps for #4.
> Remain issues of in-memory optimization (#2) and replication (#3)
> together with High Availability to be investigated in PG.
>

If he were looking at PostgreSQL for #4, I think that would be stuff like
waiting for semaphores... I suspect that since this work is probably
really minimal and PostgreSQL is single-threaded per process, that this
would be low overhead in this area.

The issue seems to be concurrent access to shared data structures, which
are a problem particularly when you start looking at multithreaded
backends......

Best Wishes,
Chris Travers

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Uckun 2012-02-27 11:05:47 Re: A better COPY?
Previous Message Marti Raudsepp 2012-02-27 09:57:50 Re: A better COPY?