Relational Algebra Meeting Recap

From: "Selena Deckelmann" <selenamarie(at)gmail(dot)com>
To: pdxpug(at)postgresql(dot)org
Subject: Relational Algebra Meeting Recap
Date: 2007-09-26 18:52:01
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pdxpug

Hello 'PUGers,

Last week's meeting was all about Relational Algebra. And mixed
drinks. And our ONE YEAR ANNIVERSARY! That's right - PDXPUG has
been around for a full year.

Twenty-one people attended - with one intrepid traveller from Eugene,
and Alec, who is moving to Seattle. We hope Alec starts a PUG with the
Whitepages folks. We also had visitors from the PHP and Ruby
communities. And even Wil Cooley. Very good to see you all!

Several people brought mixers and raw material for the bar. In
particular, Jason from the Python Users Group, Gabrielle and Schwern.
Did I miss anone? Gabrielle served as bartender for over an hour.
Thank you all very much!

We started with a short discussion of HOT (which stands for Heap ONLY
tuple), courtesy of Jeff Davis. The bottom line here is that it is a
big performance enhancement for tables which are UPDATE'd frequently.
Rather than appending modified tuples to the end of an index, existing
tuples are updated in place - which preserves the order of an index
and greatly reduces the need for a table-wide VACUUM.

There is an extensive README available:

Rafael opened the Relational Algebra talk with a promise to explain
how all the theory he & James were about to dive into would ultimately
be useful for the practical work of designing databases.

Any discussion of relational theory is incomplete without a definition
from E.F. Codd: "Any instance of the relation is always a subset of
the cross product of its domains." Uh-huh.

Some definitions: every relation is a mathematical set. A relation is
an unordered set of tuples, AKA a table or the result of an SQL query.
A tuple is a single element of a relation - in database terms, it is
a row.

There was a short discussion about why mathematical definitions are
useful, and what the database community gained by leveraging proofs
developed by set theorists. I believe somewhere during the discussion
of bag theory, we had a Proof By Intimidation!

Rafael & James then discussed nine operators:

Project (the verb) - AKA SELECT in SQL
Select - AKA WHERE in SQL
Union (direct from set theory)
Intersection (direct from set theory)
Difference (direct from set theory)
Cross product

The symbols and explanations for each of these can be found at:

One operator that is typically confusing is the Divide operator -
James summarized the Divide's basic function as "Don't give me what I
don't want."

After explaining all the operators, James dove into the practical
application portion of the talk. Being able to find equivalences is
the key to simplifying and optimizing queries. In particular,
eliminating sub-queries can give significant performance improvements.

The basic strategy behind query optimization is: Translate the SQL
query into a query tree using relational algebra operators; Generate
other, equivalent query trees; For each query tree - select an
algorithm for calculating the result and estimate the cost; Choose the
lowest cost plan!

This is what the PostgreSQL planner does, and understanding which
query tree is being used (through EXPLAIN!) in combination with the
algorithms used to fetch tuples is a valuable skill when designing
databases and queries. James also dug out an old joke from a meeting
last year about growing one's own coffee from the bean. Thank you
James for remembering our old jokes.

The meeting concluded, and several of us walked to the Lucky Lab.


Special thanks to Gabrielle for her notes and her duct tape.

Selena Deckelmann
PDXPUG - Portland PostgreSQL Users Group

Browse pdxpug by date

  From Date Subject
Next Message Selena Deckelmann 2007-09-26 18:53:36 conference -- Lunch Options near PSU volunteer
Previous Message Rafael J. Fernández-Moctezuma 2007-09-21 06:46:15 Relational Algebra references