Re: PG Patch [openserver followup]

From: Larry Rosenman <ler(at)lerctr(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: PG Patch [openserver followup]
Date: 2003-07-20 00:01:43
Message-ID: 2330000.1058659303@lerlaptop.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

------------ Forwarded Message ------------
Date: Saturday, July 19, 2003 16:10:36 -0700
From: Kean Johnston <jkj(at)sco(dot)com>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc:
Subject: Re: PG Patch

> I'm trying to get a discussion going, as Bruce wants to do it right for
> ALL platforms or none. It probably WONT happen for 7.3.4, but WILL (If I
> have my way) for 7.4.0.

Ok then let me explain the issue. You can forward this to Bruce since I
haven't heard from him yet.

As you know, the run-time link editor (RTLD) is responsible for loading an
ELF program, resolving its dependent libraries and symbols, setting things
up for _start, and calling it. There are a few ELF dynamic tags that come
into play. The ones we care about are: DT_SONAME
This is the name of the shared object that the RTLD will try to load.
DT_RPATH
Specifies the list of paths to search for dependencies (old way)
DT_RUNPATH
Specifies the list of paths to search for dependencies (new way)
DT_NEEDED
Lists the dependencies for this object

There is also one environment variable that is used at load time to resolve
dependencies, viz. LD_LIBRARY_PATH.

The gABI defines how and where these are used, but this is a basic summary.
I refer to the current object and the dependent object. The current object
is the entity which is having its dynamic section interpreted. This is the
executable or shared library that has a dependency that needs to be loaded
by the RTLD. The dependent object is the name of the actual dependency, and
comes from the DT_NEEDED list.

1) If the dependent object name contains a / use the name directly, with
no path searching.
2) Search along DT_RPATH if the current object doesnt have DT_RUNPATH
defined. DT_RPATH is a colon separated list of paths.
3) Search along LD_LIBRARY_PATH which is also a colon separated list of
paths. Only do this if the process does not have elevated (i.e setuid
or setgid) priveliges(*).
4) If the dependency still hasnt been met, search along DT_RUNPATH (if
defined for the current). DT_RUNPATH is a colon separated list of
path names.
5) If we still havent found it, look in the standard system places.
6) If we still havent resolved the dependency, bail.

(*) this is the kicker. There are *MANY* older systems out there that have
RTLD bugs that do not obey this rule. Consider the following. Most systems
have xterm. xterm is very frequently setuid root. All you need to do is run
dump -Lv on xterm to see if there are any shared libraries with no absolute
path names, or any of the dependencies of any of the libraries, and you can
get root like this. Let say, as is fairly common on older systems, that
libX11.so does not have a fully qualified path name in its DT_SONAME. When
xterm is linked, it will have a DT_NEEDED of libX11.so.5 or .6 or whatever,
without an absolute path. That means that it will use the searching
algorithms described above. All I need to do to get root is craft up
(fairly easily) a libX11.so.5 that has, in a call I know xterm will use
like XOpenDisplay, code that copies /bin/sh to somewhere and makes it
setuid root. Now I put that hacked libX11.so.5 in my home directory, set
LD_LIBRARY_PATH=$HOME, run xterm, and I've got a root shell.

This can all be so easily avoid by rule (1) above. Always hard-code your
libary names. Its a pain sometimes, to be sure, as I will describe below,
but it is completely unambiguous, its secure and it is quicker. Granted the
RTLD isnt that slow searching paths but hey, every bit counts.

Before going in to detail on the problems of using absolute path names
(there is always a catch, isn't there?), just a quick refresher on how
these various dynamic tags get set in an ELF object. This varies from
system to system but almost all system suse some subtle variation of the
following. AIX is a bit funky as I recall.

DT_RPATH is set if the link editor (ld) encounterd LD_RUN_PATH in the
environment at link edit time. Thus doing something like:
LD_RUN_PATH=/foo:/bar ld -o libfnoz.so blah.o
would set DT_RPATH to /foo:/bar.

DT_RUNPATH is set by the -R option on System-V-ish link editors, and by
-rpath with GNU-ish ones (I think, I am no GNU ld expert, please correct me
if I am wrong).

DT_SONAME is set by -h on System-V-ish link editors and by -soname on
GNU-ish ones.

DT_NEEDED is set by any ELF link editor based on the -l options or explicit
linkage against another shared object. It uses the DT_SONAME from the
dependency to put in the object's DT_NEEDED list.

While absolute path names are the way to fly, in general, they have their
drawbacks too. First, it can be a right royal pain to bootstrap things.
Consider this. You are building a program. As part of its build, it builds
a shared library, and link edits it with an absolute DT_SONAME. Later in
the build you link a program against it, and you want to use that program
in the build (perhaps executing it to produce some intermediate file or
whatever). If this is the very first time you are compiling the program and
library, then the shared library wont exist in its specified location, and
execution of the program will fail. So you have to wait for the build to
fail, then copy the just built library into its install location and
continue the build, possibly repeating this several times.

Another, sometimes more frustrating problem is encountered if you DO have
an older version of the library installed. Lets say the library was
/usr/lib/libfoo.so.2. You are recompiling your stuff, building a new
version of libfoo. It contains bug fixes, but is not sufficiently different
to warrant moving to libfoo.so.3. Now during your build you link a program
against -lfoo, and when you execute it, lo and behold, it runs, becuase
/usr/lib/libfoo.so.2 is already in place from an earlier install. But the
libfoo that the program is referencing is the buggy one, and it may make it
impossible to build your program, or may produce incorrect results.

So what we need is the ability to always reference the frehsly built
libraries while we are building the system, and to make sure that the final
installed ones ahve full path names and that executables have been
re-linked against them. This is possible, and fairly easy, but it does mean
that all programs and libraries need to be relinked at install time, and
they need to be done in the correct order. But its pretty straight-forward.

During build time, use LD_RUN_PATH or -R (or even -h with absolte path
names pointing into the build tree) and do the build. As you install, you
relink each shared library with -h and the final destination path name for
the library, making sure you relink all libraries in the correct order such
that all DT_NEEDED's have absolute path names. As you install each binary
that depends on these libraries, you also relink them before doing the
install.

libtool gets some, but not all of this right. However libtool has its own
drawbacks, not least of which is its compltely non-sensical version
numbering scheme which the docs go to great lengths to promote as an ideal
solution. Its not, its crap.

An even easier solution is one I have been thinking about a lot of late. It
would simplify the build and install procedures dramatically. I am thinking
of writing an open source tool called "somod". This will allow you to
change the ELF headers on already-installed ELF programs, adjusting the
DT_SONAME, DT_NEEDED, DT_RUNPATH and DT_RPATH variables as you see fit.
This would then simplify the build procedure by simply adding a step that
after installing a shared library or binary, you run somod on it to set
things up the way you want. That would be the least invasive way, and also
allow you to take remedial action on old programs you may not have the
source for, or even on mis-compiled or mis-produced vendor files.

Kean

---------- End Forwarded Message ----------

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2003-07-20 00:48:06 Re: [HACKERS] allowed user/db variables
Previous Message Bruce Momjian 2003-07-19 22:30:02 Re: [NOVICE] connectby(... pos_of_sibling)