Re: Optimizer internals

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Mark Lewis" <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)lusis(dot)org, "PGSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimizer internals
Date: 2006-06-16 13:21:01
Message-ID: 8764j18cea.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:

> Now, if we're considering UPDATES (the worst case for PostgreSQL's
> current MVCC architecture), then this is (IMHO) a true statement.
> There aren't many *successful* commercial databases that incur the
> additional overhead of creating another version of the record, marking
> the old one as having been updated, inserting N-number of new index
> entries to point to said record, and having to WAL-log all
> aforementioned changes.

Well Oracle has to do almost all that same work, it's just doing it in a
separate place called a rollback segment. There are pros and cons especially
where it comes to indexes, but also where it comes to what happens when the
new record is larger than the old one.

> I've done a good amount of research on enhancing PostgreSQL's MVCC in UPDATE
> conditions and believe there is a nice happy medium for us.

IMHO the biggest problem Postgres has is when you're updating a lot of records
in a table with little free space. Postgres has to keep jumping back and forth
between the old records it's reading in and the new records it's writing out.
That can in theory turn a simple linear update scan into a O(n^2) operation.
In practice read-ahead and caching should help but I'm not clear to what
extent.

That and of course the visibility bitmap that has been much-discussed that
might make vacuum not have to visit every page and allow index scans to skip
checking visibility info for some pages would be major wins.

> /me waits for the obligatory and predictable, "the benchmarks are
> flawed" response.

I wouldnt' say the benchmarks are flawed but I also don't think you can point
to any specific design feature and say it's essential just on the basis of
bottom-line results. You have to look at the actual benefit the specific wins.

Oracle and the others all implement tons of features intended to optimize
applications like the benchmarks (and the benchmarks specifically of course:)
that have huge effects on the results. Partitioned tables, materialized views,
etc allow algorithmic improvements that do much more than any low level
optimizations can do.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-06-16 13:30:32 Re: Delete operation VERY slow...
Previous Message Jonah H. Harris 2006-06-16 12:51:33 Re: Optimizer internals