Last night's meeting, next month's announcement

From: Selena Deckelmann <sdeckelmann(at)chrisking(dot)com>
To: Postgresql PDX_Users <pdxpug(at)postgresql(dot)org>
Subject: Last night's meeting, next month's announcement
Date: 2007-08-17 16:02:58
Message-ID: 801615EB-17D6-425D-B03D-BC89653E4CC2@chrisking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pdxpug

SUMMARY:

Our meeting was awesome. Jeff is smart. I learned something about
Linux I/O schedulers and rules that I DID NOT EXPECT. Three new
people attended!

Next month's meeting: Relational Algebra, with 3 PhD candidates
(James, Vassilis, Rafael). We may need to serve alcohol at this
meeting. Or perhaps we will see a return of the cupcakes, since
baking season will have started.

DETAILS:
The August 16th meeting began with a short discussion of Rules vs.
Triggers. I forgot to come up with an EXPLAIN operator.

When do you choose to create a rule or a trigger? Jeff explained that
for table partitioning, the recommendation was to use triggers. In
some cases, you can have a query whose predicate is altered inside of
a rule and causes (in a difficult way to think about) the “window” of
data’s result NULL.

We all talked about that for a while, tried to come up with an
example case - which was hard. Then we tried to frame MySQL users for
something. And then we moved on.

I would like to revisit rules vs. triggers, and come up with the
example case to explain what we were talking about!

We had a few new faces - including the leader of the PHP group - Sam!
Also, Jerry was looking for someone to help him out with some SQL
questions. We hope he posts some questions to the list.

Jeff’s talk was largely about his patch, with a few bits about his
development environment, a patch from Simon Riggs that was related
but not dependent, and a little database theory thrown in.

The inspiration for Jeff’s Synchronized Scan patch was the idea that
Sequence Scans can really start at any place between 0 and N, with N
being the number of records in a table. Before his patch to 8.3, it
was truly arbitrary that all Sequence Scans were starting at 0. In
the past, DBAs would just need to plan for poor or unpredictable
performance when multiple sequence scans occurred.

The patch implements a system where each process keeps track of where
a sequence scan is at - in a tiny piece of shared memory. Then, when
a new sequence scan starts up on the same table, it is given a hint
as to where to start. The effect is that the second sequence scan now
asks for data that is in the cache. For any tables that are larger
than cache size, and whose queries are I/O-bound, this is a big
performance benefit, with no performance penalties. So awesome!

(There's lots more detail that you should check out in Jeff's slides,
as well as a nice diagram that really explains it)

Now in 8.3, results from queries are truly non-deterministic. The
documentation for PostgreSQL has always said this, but now, it is
certain. Jeff's patch only kicks in when tables are of a certain
size, but still: use ORDER BY if you want data returned in a certain
order!

Jeff also discussed Simon Riggs’ patch which implements a small ring
cache to service Sequence Scans. This is also a big performance
improvement because it prevents cache pollution by confining sequence
scan data to a small space that can’t push other cached data around.
Also, it is supposedly sized to fit in L2 cache, improving
performance even more for certain hardware architectures. Jeff
mentioned that PostgreSQL already does a pretty good job with cache
management, but this patch makes the caching even more efficient.

Another topic that came up was the Linux I/O scheduling algorithms.
Jeff had originally tested his patch using the Deadline, NOOP, and
Anticipatory schedulers. When he tried it with CFQ more recently, it
didn’t work so well. (a quick google search tells me that RHEL uses
CFQ! AURGH.) He’d also tested ZFS, which seemed to work well but
needed more testing.

Someone (sorry I can't remember your name!) brought up that it would
be nice if the scheduling algorithm picked by a distribution and/or
operating system by default was documented in one place. I agree!
That would be useful.

Mark spoke up and mentioned that Deadline worked very well in
general, non-deterministic cases with PostgreSQL.

There were tons of great questions, and even a few esoteric,
theoretical arguments. Very good meeting, everyone!

--

Afterward, the Lucky Lab was crazy busy! We drank a couple pitchers,
talked about the linux kernel, and I think there was a long argument
about BSNF.

We did decide that someone was going to have to give a talk on
“Hypercubes and Dungeons and Dragons: what you never thought they had
in common”.

Next month’s meeting with be about relational algebra, with James,
Vassilis and Rafael tag-teaming. Rafael has been teaching the intro
to databases class this summer at PSU, so he is ready for some real
heckling. I can only hope that Randal will be able to make it.

Browse pdxpug by date

  From Date Subject
Next Message Jeff Davis 2007-08-17 23:03:44 rule weirdness
Previous Message Randal L. Schwartz 2007-08-17 00:53:48 Re: TODAY! Synchronized Scanning with Jeff Davis