Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought
Date: 2012-02-27 12:19:10
Message-ID: CAKt_ZfsJ6aOj7nkrGUXOMSBrFHExR=Eey6tbYdqSzVhYJ_dC2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Feb 27, 2012 at 3:46 AM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:

> Hi,
>
> 2012/2/27 Chris Travers <chris(dot)travers(at)gmail(dot)com> wrote:
> >> 1. Buffering Pool
> >>
> >> To get rid of I/O bounds Mike proposes in-memory database structures.
> ...
> >> Now I'm still wondering why PG could'nt realize that probably in
> >> combination with unlogged tables? I don't overview the respective code
> >> but I think it's worthwhile to discuss even if implementation of
> >> memory-oriented structures would be to difficult.
> >
> >
> > The reason is that the data structures assume disk-based data
> structures, so
> > they are written to be efficient to look up on disk but not as efficient
> in
> > memory.
>
> That means, that this could be enhanced in PG.
> Is there really no research or implementation projects going on in the
> direction where all table content can be hold in-memory? This could be
> enhanced in many ways (besides optimized in-memory structures)
> including index-only scans.
>

So if I want to read a row in a middle of a table and it is now on disk, I
have to read the whole thing into memory first? I see so many problems
with trying to do in-memory-structure on disk that I just don't see that as
happening.

>
> > Note that VoltDB is a niche product and Stonebreaker makes this pretty
> > clear. However, the more interesting question is what the tradeoffs are
> > when looking at VoltDB vs Postgres-XC.
>
> Ok, that's interesting too, looking at Postgres-XC (eXtensible
> Cluster) is a multi-master write-scalable PostgreSQL cluster based on
> shared-nothing architecture.
> But I'm thinking more about enhancing PostgreSQL core.
>
> VoltDB niche product? Look at his USENIX conferenc speech
> (http://www.youtube.com/watch?v=uhDM4fcI2aI ) minute 11:53: There he
> puts VoltDB into "high OLTP" and "old elephants" into category "low"
> which only get the crevices and which have "The Innovators Dilemma".
>

Strangely, he doesn't consider PostgreSQL to be an elephant...... His
point is, as I understood it, that open source databases like PostgreSQL
will drive the proprietary equivalents out of the market because they
cannot reach these specific challenges. That's why the crevices exists
between open source and the niche corners of his triangle.

>
> My thesis is that PG doesn't have that problem necessarily because
> it's open source and can be (and has been) refactored since then.
>

If you watch the Usenix speech closely you will notice something.

Two of the major sources of overhead have to do with durability (you know,
writing to the hard disk), and the other two have to do with concurrency.
His solution is to get rid intra-machine durability (durability only
exists distributed throughout the cluster in VoltDB), and also get rid of
concurrency. As he puts it, there's no need to deschedule a process that's
going to run for only 50 microseconds. This is why this is a niche
product. It requires a specialized environment to run with real ACID
compliance. This isn't something like hydrolics in back-hoes. It's a
fundamental attribute of their innovative architecture.

There are a heck of a lot of environments where this sort of approach
really doesn't make sense. In fact I would say that the majority of
databases deployed in the future will be in areas where it doesn't make
sense. I'd say, realistically, that you have to be at least close to the
edge of what you can reasonably do on modern hardware with new versions of
PostgreSQL before it makes sense to consider that tradeoff. That means,
what, 100000 transactions per sec?

For the sorts of applications I write, the costs of going with something
like VoltDB would easily eclipse the benefits in every single deployment,
and moreover this is not due to questions of the maturity of the technology
but rather of fundamental design choices. There are a lot really, really
cool things VoltDB would be good at doing (from state handling of MMORGS to
handling of huge numbers of inputs from a million sensors in real time, and
providing reporting data on these).

But those are niche markets.

Best Wishes,
Chris Travers

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2012-02-27 12:30:20 Re: explain and index scan
Previous Message Léa Massiot 2012-02-27 11:55:43 Re: Default PostgreSQL server encoding - Change to unicode (utf8)