Re: Reducing tuple overhead

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, hlinnaka(at)iki(dot)fi, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Reducing tuple overhead
Date: 2015-04-23 17:00:45
Message-ID: 5539253D.8000506@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 04/23/2015 09:42 AM, Jim Nasby wrote:
>
> On 4/23/15 11:24 AM, Andres Freund wrote:
>> I do wonder what, in realistic cases, is actually the bigger contributor
>> to the overhead. The tuple header or the padding we liberally add in
>> many cases...
>
> Assuming you're talking about padding between fields...
>
> Several years ago Enova paid Command Prompt to start work on logical
> column ordering, and part of the motivation for that was to allow
> re-ordering physical tuples into the most efficient on-disk format
> possible. I think I did some tests re-arranging some tables into the
> theoretically most efficient order and measuring heap size. I think
> there was some modest size improvement, maybe 10-15%? This was several
> years ago so it's all foggy. Maybe Josh can find some of this in CMD's
> ticketing system?

Yeah I dug around. I don't see anything about size improvement but here
are our notes:

Alvaro said:

I ended up not producing notes as regularly as I had initially hoped. To
try and make up for it, here's an update covering everything I've done
since I started working on this issue.

This patch turned out to be completely different than what we had
initially thought. We had thought it was going to be a matter of finding
out places that used "attnum" and replace it with either attnum,
attlognum or attphysnum, depending on what order was necessary on any
given spot. This wasn't an easy thing to do because there are several
hundreds of those. So it was supposed to be amazingly time-consuming and
rather boring work.

This has nothing to do with reality: anywhere from parser down to
optimizer and executor, the way things work is that a list of attributes
is built, processed, and referenced. Some places assume that the list is
in a certain order that's always the same order for those three cases.
So the way to develop this feature is to change those places so that
instead of receiving the list in one of these orders, they instead
receive it in a different order.

So what I had to do early on, was find a way to retrieve the sort order
from catalogs, preserve it when TupleDescriptors are built, and ensure
the attribute list is extracted from TupleDesc in the correct order. But
it turned out that this is not enough, because down in the parser guts,
a target list is constructed; and later, a TupleDescriptor is built from
the target list. So it's necessary to preserve the sorting info from the
original tuple descriptor into the target list (which means adding order
info to Var and TargetEntry nodes), so that the new TupleDesc can also
have it.

Today I'm finding that even more than that is necessary. It turns out
that the RangeTableEntries (i.e. the entries in the FROM clause of a
query) have an item dubbed "eref" which is a list of column names; due
to my changes in the earlier parser stages, this list is sorted in
logical column order; but the code to resolve things such as columns
used in JOIN/ON clauses walks the list (which is in logical order) and
then uses the number of times it had to walk the elements in the list to
construct a Var (column reference) in "attnum" order -- so it finds a
different column, and it all fails.

So what I'm doing now is modify the RangeTableEntry node to keep a
mapping list of logical to identity numbers. Then I'll have to search
for places using the rte->eref->colnames and make sure that they
correctly use attlognum as index into it.

And then later:

First of all I should note that I discussed the approach mentioned above
to pgsql-hackers and got a very interesting comment from Tom Lane that
adding sorting info to Var and TargetEntry nodes was not a very good
idea because it'd break stored rules whenever a table column changed. So
I went back and studied that code and noticed that it was really the
change in RangeTableEntry that's doing the good magic; those other
changes are fortunately not necessary. (Though there were a necessary
vehicle for me to understand how the other stuff works.)

I've been continuing to study the backend code looking for uses of
attribute lists that assume a single ordering. As I get more into it,
more complex cases appear. The number of cases is fortunately bounded,
though. Most of the uses of straight attribute lists are in places that
do not require modification, or require little work or thought to update
correctly.

However, some other places are not like that. I have "fixed" SQL
functions two times now, and I just found out that the second fix (which
I believed to be "mostly correct") was to be the final one, but I found
out just now that it's not, and the proper fix is going to require
something a bit more low-level (namely, a projection step that reorders
columns correctly after the fact). Fortunately, I believe that this
extra projection step is going to fix a lot of other cases too, which I
originally had no idea how to attack. Moreover, understanding that bit
means I also figured out what Tom Lane meant on the second half of his
response to my original pgsql-hackers comment. So I think we're good on
that front.

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-04-23 17:17:03 Re: adding more information about process(es) cpu and memory usage
Previous Message Radovan Jablonovsky 2015-04-23 17:00:26 adding more information about process(es) cpu and memory usage