Re: Postgres as In-Memory Database?

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres as In-Memory Database?
Date: 2014-04-01 10:59:49
Message-ID: CAFcOn2-=NueGYYKtWQVtPymXZYoFwBXKw=-7mg8XNMeSM04gGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Jeff

I agree with most of your statements.

2014-04-01 4:20 GMT+02:00 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:

> On Sunday, March 30, 2014, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
>
>> Hi Jeff
>>
>>
>> 2013/11/20 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>>
>>>
>>> I don't know what you mean about enhancements in the buffer pool. For
>>> an in-memory database, there shouldn't be a buffer pool in the first place,
>>> as it is *all* in memory.
>>>
>>
>> You are right: In-memory DBs are making buffer-pooling obsolete - except
>> for making data persistent (see below).
>>
>
> I would be very reluctant to use any database engine which considered disk
> access obsolete. There will always be a risk where data grows to exceed
> RAM, and where it would be inconvenient to expand RAM fast enough to
> accommodate it. I've played those games enough with Perl and C in-memory
> systems. You fight and squeeze to fit the data into RAM, then the data
> size grows 3% and all of our work is for naught. You can buy more RAM, if
> you have the budget, and the RAM isn't back-ordered for 3 months because
> the factory that makes it had a fire, and if more RAM fits on your
> motherboard, and....
>

I think there are some use cases where you can plan and foresee how data
increases.

>
>
>> >
>> > Do you know why it is slow? I'd give high odds that it would be a
>> specific implementation detail in
>> > the code that is suboptimal, or maybe a design decision of PostGIS,
>> rather than some high level
>> > architectural decision of PostgreSQL.
>>
>> Referring to the application is something you can always say - but
>> shouldn't prevent on enhancing Postgres.
>>
>>
>> Postgres has been enhanced. Now we need to change osm2pgsql to take
>> advantage of them. It defines indexes on the tables that are going to be
>> bulk loaded with COPY, which defeats some recent optimizations made to
>> COPY. The creation of the indexes should be delayed until after the bulk
>> load is done.
>>
>> A further enhancement to Postgres would be would be to automatically
>> defer creation of the indexes when a table is truncated or created within a
>> transaction, so that users get the benefit of the improvement
>>
>> These enhancements to osm2pgsql seem to be reasonable to me. I hope
>> somebody has time to care about.
>>
>
> I have a fork of osm2pgsql on github which delays the index build until
> the COPY is done. I'm not really motivated to convince anyone to merge it
> (as my interest is postgresql not osm itself), but if someone wants to pick
> it up, that is fine with me. It helps somewhat, but it is not a
> game-changer because there are other bigger bottlenecks, at least for HDD
> based systems.
>
> One of the bigger bottlenecks is building the GIN indexes on the way table
> at the end. Setting maintenance_work_mem to huge values helps a lot, if
> you can find a safe setting for it considering multiple index builds it
> might be doing (at that point in the load, osm2pgsql's node cache has been
> released, so there is substantial RAM to re-purpose). It would be better
> for this use if PostgreSQL built the index by using an external sort,
> rather than iterating over the table building maintenance_work_mem sized
> chunks of red-black trees. The problem there is that osm uses the gin
> index in an odd way (the vast majority of nodes occur in exactly one way,
> with a minority occurring in more than one), and using a disk sort might
> not be ideal for the more common use cases where GIN is used, where a given
> token usually occurs in far more than one document. So an improvement that
> only improves osm2pgsql and degrades other uses is unlikely to be adopted.
>
> Another bottleneck is just the raw COPY into the node table. When that is
> running against an unindexed table which was created in the same
> transaction, I see that osm2pgsql takes about 50% of a CPU to print a
> copy-stream, and postgresql uses about 50% of a CPU to parse that stream
> and insert into the table. So they add up to about 1 CPU despite the fact
> this a is multiple CPU machine. So they seem to be playing ping-pong with
> the pipe buffer when in theory they should each by able to run at almost
> full speed. I don't know how to get it stop playing ping-pong, but I have
> other use cases where this shows up, so trade-off-free solution would be
> nifty. I suspect that that is more of a kernel issue than either
> postgresql or osm2pgsql.
>
> You could do the COPY in parallel in multiple threads, but the problem
> there is you can't use the "created in same transaction" optimization to
> avoid WAL overhead. There is no fix to this without changing PostgreSQL to
> accommodate it, but i have no clear idea how one would do that. Importing
> a snapshot doesn't seem like it would be enough, as you can only import
> snapshots for reading, not for writing.
>
> Also, I'm not too sure how seriously to take the goal of optimizing
> osm2pgsql. Development on it seems to be less than vigorous. And its
> purpose is to create a database to be used, so wouldn't it make more sense
> to optimize the use, not the creation? And if you do want to optimize the
> creation, the obvious way to do it so to create the export in a way more
> closely aligned to that need, rather than a generic export.
>

As long as planet file importtakes about 5 days, I think it's an issue?

>
>> In the meantime I discussed with HANA users an thought about what makes
>> in-memory dbs special and how to configure Postgres to be an in-memory db.
>>
>> There seem to be two main things which make in-memory dbs special:
>> 1. Index: Having all data in memory there is no need for a default index.
>> A full-table scan "suddenly" becomes the default.
>>
>
> Surely not. I would say that full table scans are *already* the default,
> deviated from only if it thinks an index seems to be better. If you don't
> make an index, it can't seem to be better. And I don't know of any
> higher-level in-memory language which fails to provide a way to do
> efficient searching into an in-memory structure, usually in the form of
> hash tables or balanced trees. If "let's seq scan everything as long as it
> is already in memory" is a good idea, why would Perl, Java, Python, etc.
> (not to mention C libraries and the source code of PostgreSQL itself)
> provide ways to do efficient searches in memory?
>
> The caveat here is you can't make certain constraints without an index.
> In theory you *could* have an unique constraint without an index to
> support it, but if it were implemented you would probably rapidly learn
> that you don't actually want to do that.
>

Ok. But at least it seems to me obvious that there is a need for different
index (configurations) since the currently implemented indices are designed
to avoid secondary storage.

>
>
>> 2. Persistence: All updates/changes to the db are streamed to disk.
>>
>
> That is already done through WAL files. But they are also written to
> memory in their proper location, and then that location is written to disk
> as well in a scattered manner once per checkpoint. If you suppress
> checkpoints entirely then every time the database is restarted it would
> have to read the entire history of the database since creation from the WAL
> to create the initial in-memory image. So, how should checkpoints be
> handled?
>

I have to look up how HANA solves this since that's one of the USPs they
say.
At least the delay checkpoints and accept "small data loss", thats for sure.

>
>
>> 3. An increase of data simply needs to be compensated with more memory
>> (since memory became cheap).
>>
>
> The size of the data seems to increasing at the same rate as the RAM is
> getting cheaper, if not faster. And the new RAM might need a new
> motherboard, and then a new power supply, and a new dev environment and
> new test environment and then a new rack to mount them, and then a union
> electrician to wire up the new rack, and then a new data center to hold the
> next new rack.....
>
> The same could be said for new hard drives, too, but I usually have more
> head room in my hard drives than in my RAM.
>
>
>> AFAIK one can configure Postgres for all these properties: One can force
>> the use of full-table scan(?) and replicate to disk(?).
>>
>
> Yes, already can be done. Don't build an index, and delay checkpoints as
> long as you dare (1 hour is currently the max, but a simple hack can extend
> that. Whether you will like the consequences of that is another matter).
>
>
>>
>> What do you think about this?
>>
>
> I think there is no reason to think these changes will do much good. In
> memory databases usually run as libraries, not as client-server
> infrastructures. Without that, you already have one hand behind your back.
> Also, I think they usually map datatypes directly to hardware supported
> types, which is surely faster but would render PostgreSQL's flexible type
> system nonfunctional. I don't know how they deal with visibility, but I
> have to assume they either compromise on correctness, or limit the amount
> of updating which is possible on the in memory portion. By the time you do
> that stuff, it isn't clear what would be left anymore of the things that
> make PostgreSQL be PostgreSQL.
>
>
>> Are there any show cases out there?
>>
>
> What did the HANA users have to say? Seems like they would be in the best
> position to provide the test cases.
>

Yes, HANA or Oracle TimesTe or H-Store (VoltDB).

>
> Cheers,
>
> Jeff
>
>>

Yours, Stefan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Fetter 2014-04-01 13:40:36 == PostgreSQL Weekly News - April 01 2014 ==
Previous Message James Harper 2014-04-01 10:46:59 unexpected unnest behaviour