Re: Postgres as In-Memory Database?

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Stefan Keller <sfkeller(at)gmail(dot)com>, pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres as In-Memory Database?
Date: 2014-04-01 06:54:11
Message-ID: B45B0FA9-D76D-4B55-A837-00EDDE034969@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01 Apr 2014, at 4:20, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Sunday, March 30, 2014, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
> There seem to be two main things which make in-memory dbs special:
> 1. Index: Having all data in memory there is no need for a default index. A full-table scan "suddenly" becomes the default.
>
> Surely not. I would say that full table scans are *already* the default, deviated from only if it thinks an index seems to be better. If you don't make an index, it can't seem to be better. And I don't know of any higher-level in-memory language which fails to provide a way to do efficient searching into an in-memory structure, usually in the form of hash tables or balanced trees. If "let's seq scan everything as long as it is already in memory" is a good idea, why would Perl, Java, Python, etc. (not to mention C libraries and the source code of PostgreSQL itself) provide ways to do efficient searches in memory?
>
> The caveat here is you can't make certain constraints without an index. In theory you *could* have an unique constraint without an index to support it, but if it were implemented you would probably rapidly learn that you don't actually want to do that.

That’s what I thought initially reading that line as well, but that’s not necessarily true for an in-memory database.

The keyword here is “default index”. The reasoning is probably along the lines that if all your data is in-memory, then keeping it all sorted on the primary column(s) is relatively cheap (compared to doing so on disk). You could even split off the primary column(s) from the rest of the records and link back to them using pointers, which makes it easier to keep them in a (somewhat) balanced btree.

When you do that, the data effectively becomes the primary key index and would theoretically also be usable as the primary key constraint. So you _do_ have an index, but it’s not a separate one; it’s part of your data. That frees up memory that you do not need to preserve for an index, which is probably rather a gain for an in-memory database.

Apparently this works for HANA, but for an MVCC database like Postgres that’s a little more involved.

Of course, with databases like that you want them replicated on different hardware and kernels to reduce data-loss risks. And that brings us closer to what ISTR Stonebreaker is doing these days.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message shetty65 2014-04-01 07:13:22 Unattended Installation
Previous Message Rajeev rastogi 2014-04-01 05:37:09 Re: Doubts on startup costs