Skip site navigation (1) Skip section navigation (2)

Re: Hostnames in pg_hba.conf

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Bart Samwel <bart(at)samwel(dot)tk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hostnames in pg_hba.conf
Date: 2010-02-11 15:36:35
Message-ID: 4B742403.9000009@mark.mielke.cc (view raw or flat)
Thread:
Lists: pgsql-hackers
On 02/11/2010 08:13 AM, Bart Samwel wrote:
> ISSUE #1: Performance / caching
>
> At present, I've simply not added caching. The reasoning for this is 
> as follows:
> (a) getaddrinfo doesn't tell us about expiry, so when do you refresh?
> (b) If you put the cache in the postmaster, it will not work for 
> exec-based backends as opposed to fork-based backends, since those 
> read pg_hba.conf every time they are exec'ed.
> (c) If you put this in the postmaster, the postmaster will have to 
> update the cache every once in a while, which may be slow and which 
> may prevent new connections while the cache update takes place.
> (d) Outdated cache entries may inexplicably and without any logging 
> choose the wrong rule for some clients. Big aargh: people will start 
> using this to specify 'deny' rules based on host names.
>
> If you COULD get expiry info out of getaddrinfo you could potentially 
> store this info in a table or something like that, and have it updated 
> by the backends? But that's way over my head for now. ISTM that this 
> stuff may better be handled by a locally-running caching DNS server, 
> if people have performance issues with the lack of caching. These 
> local caching DNS servers can also handle expiry correctly, etcetera.
>
> We should of course still take care to look up a given hostname only 
> once for each connection request.

You should cache for some minimal amount of time or some minimal number 
of records - even if it's just one minute, and even if it's a fixed 
length LRU sorted list. This would deal with situations where a new 
connection is raised several times a second (some types of load). For 
connections raised once a minute or less, the benefit of caching is far 
less. But, this can be a feature tagged on later if necessary and 
doesn't need to gate the feature.

Many UNIX/Linux boxes have some sort of built-in cache, sometimes 
persistent, sometimes shared. On my Linux box, I have nscd - "name 
server caching daemon" - which should be able to cache these sorts of 
lookups. I believe it is used for things as common as mapping uid to 
username in output of "/bin/ls -l", so it does need to be pretty fast.

The difference between in process cache and something like "nscd" is the 
inter-process communication required to use "nscd".


> ISSUE #2: Reverse lookup?
>
> There was a suggestion on the TODO list on the wiki, which basically 
> said that maybe we could use reverse lookup to find "the" hostname and 
> then check for that hostname in the list. I think that won't work, 
> since IPs can go by many names and may not support reverse lookup for 
> some hostnames (/etc/hosts anybody?). Furthermore, due to the 
> top-to-bottom processing of pg_hba.conf, you CANNOT SKIP entries that 
> might possibly match. For instance, if the third line is for host 
> "foo.example.com <http://foo.example.com>" and the fifth line is for 
> "bar.example.com <http://bar.example.com>", both lines may apply to 
> the same IP, and you still HAVE to check the first one, even if 
> reverse lookup turns up the second host name. So it doesn't save you 
> any lookups, it just costs an extra one.

I don't see a need to do a reverse lookup. Reverse lookups are sometimes 
done as a verification check, in the sense that it's cheap to get a map 
from NAME -> IP, but sometimes it is much harder to get the reverse map 
from IP -> NAME. However, it's not a reliable check as many legitimate 
users have trouble getting a reverse map from IP -> NAME. It also 
doesn't same anything as IP -> NAME lookups are a completely different 
set of name servers, and these name servers are not always optimized for 
speed as IP -> NAME lookups are less common than NAME -> IP. Finally, if 
one finds a map from IP -> NAME, that doesn't prove that a map from NAME 
-> IP exists, so using *any* results from IP -> NAME is questionable.

I think reverse lookups are unnecessary and undesirable.

> ISSUE #3: Multiple hostnames?
>
> Currently, a pg_hba entry lists an IP / netmask combination. I would 
> suggest allowing lists of hostnames in the entries, so that you can at 
> least mimic the "match multiple hosts by a single rule". Any reason 
> not to do this?

I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I 
would say that if multiple names are supported, then multiple IP/netmask 
should be supported. But, this does make the lines unwieldy beyond two 
or three. This direction leans towards the capability to define "host 
classes", where the rules allows the host class, and the host class can 
have a list of hostnames.

Two other aspects I don't see mentioned:

1) What will you do for hostnames that have multiple IP addresses? Will 
you accept all IP addresses as being valid?
2) What will you do if they specify a hostname and a netmask? This seems 
like a convenient way of saying "everybody on the same subnet as NAME."

Cheers,
mark

-- 
Mark Mielke<mark(at)mielke(dot)cc>

In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2010-02-11 15:53:22
Subject: Re: Writeable CTEs and empty relations
Previous:From: Euler Taveira de OliveiraDate: 2010-02-11 15:29:38
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group