Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought
Date: 2012-02-26 14:37:54
Message-ID: 4F4A43C2.9050206@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 02/25/2012 06:54 PM, Stefan Keller wrote:
> Hi,
>
> Recently Mike Stonebraker identified four areas where "old elephants"
> lack performance [1]:
>
> 1. Buffering/paging
> 2. Locking/Multithreading
> 3. WAL logging
> 4. Latches (aka memory locks for concurrent access of btree structures
> in buffer pool?).
>
> He claims having solved these issues while retaining SQL and ACID.
> But the only issue I understood is #1 by loading all tuples in-memory.
> => Are there any ideas on how to tell Postgres to aggressively load
> all data into memory (issue #1)?
> All remaining issues make me wonder.
> I actually doubt that there are alternatives even theoretically.
> => Can anyone help explaining me issues 2,3 and 4, their solutions,
> and why Postgres would be unable to resolve them?
>

> 1. Buffering/paging

PG, and your operating system, already do this for reads. It also keeps things that are hit harder and lets things go that are not hit as much. On the writing side, you can configure it PG from "write it! write it NOW!", to "running with scissors" depending on how safe you want to feel.

> 2. Locking/Multithreading

PG does have some internal structures that it needs to lock (and anytime you read lock, think single user access, or one at a time, or slow). Any time you hear about lock contention, it's multiple processes waiting in a line for a lock. If you only had one client, then you really would not need locks. There is where multithreading comes from, but in PG we use multi-process instead of multi-thread, but its the same thing. Two (or more) people are needing to lock something so they can really screw with it. PG does not need as many locks as other db's however. It uses an MVCC architecture so under normal operations (insert, update, select, delete) people dont block eacth other. (ie readers dont block writers and visa versa).

I don't see locking going away, but there are not many loads that are lock bound. Most database loads are IO bound, and then you'd probably be CPU bound before you are lock bound. (although it can be hard to tell if its a spin lock that's making you cpu bound). I'm sure there are loads that hit lock contention, but there are probably ways to mitigate it. Say you have a process that alters the table and adds a new column every two seconds, thing updates a single row to add data to the new column just added. I can see that being lock bound. And a really stupid implementation.

> 3. WAL logging

PG writes a transaction twice. Once to WAL and once to the DB. WAL is a simple and quick write, and is only ever used if your computer crashes and PG has to re-play transactions to get the db into a good/known state. Its a safety measure that doesn't really take much time, and I don't think I've heard of anyone being WAL bound. Although it does increase IO ops, it's not the biggest usage of IO. This one falls under "lets be safe" which is something NoSQL did away with. Its not something I want to give up, personally. I like using a net.

> 4. Latches

I can only guess at this one. Its similar to locks I think. Data structures come in different types. In the old days we only had single user access to data structures, then when we wanted two users to access it we just locked it to serialize access (one at a time mode), but that does not scale well at all, so we invented two new types: lock free and wait free.

An index is stored as a btree. To insert a new record into the index you have to reorganize it (rotate it, sort it, add/delete nodes, etc), and while one client is doing that it can make it hard for another to try and search it. Lock free (and wait free) let multiple people work on a btree at the same time with much less contention. Wikipedia does a better job of explaining them than I could:

http://en.wikipedia.org/wiki/Non-blocking_algorithm

I have no idea if PG uses single user locks or some kind of lock free structure for its internals. I can see different parts of the internals needing different levels.

Maybe I'm old fashioned, but I don't see how you'll get rid of these. You have to insert a record. You have to have 12 people hitting the db at the same time. You have to organize that read/write access somehow so they dont blow each other up.

-Andy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Clodoaldo Neto 2012-02-26 15:45:16 Constant value for a partitioned table query inside a plpgsql function
Previous Message Jayashankar K B 2012-02-26 12:16:05 Re: Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8