Re: coalesce in plpgsql, and other style questions

From: Ross Boylan <ross(at)biostat(dot)ucsf(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: ross(at)biostat(dot)ucsf(dot)edu, pgsql-novice(at)postgresql(dot)org
Subject: Re: coalesce in plpgsql, and other style questions
Date: 2012-06-13 00:42:31
Message-ID: 1339548151.5384.157.camel@corn.betterworld.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Thanks for your responses. Some followup below.
On Tue, 2012-06-12 at 16:30 -0700, Jeff Davis wrote:
> On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
> > I just wrote my first pl/pgsql function, and would appreciate any
> > comments people have on it. I'll be writing a bunch of similar
> > functions, with semantics "give me the id of the object if exists,
> > otherwise create it and give me the id."
> >
> > My solution seems excessively procedural to me.
>
> I think what you're trying to do is procedural, and that's not
> necessarily bad. You are trying to get an existing ID if available, or
> assign a new ID if not; and I assume you are doing so to save the space
> associated with storing the full names (essentially like dictionary
> encoding).
>
> > I thought I could get
> > the right semantics with something like
> > select coalesce((select id from mytable where name='foo'),
> > (insert into mytable ('name') values('foo') returning id))
> > but I could not get that to work in plgsql.
>
> In pl/pgsql, you can't just use SELECT, you have to use something like
> SELECT INTO the_id ... or use PERFORM.
>
> Also, you have a potential race condition there, because someone might
> insert "foo" into mytable after the select (first arg to coalesce) and
> before the insert (second arg to coalesce).
Practically, it's just me so there shouldn't be any risk. But I'd like
to understand the general issue. I thought transaction would take care
of this, so that within a transaction the state of the database does not
change from actions in other sessions. Then if I commit and have
conflict, the commit fails.

I guess the sequence I'm using for did assures that did is unique across
all transactions, and so the 2 transactions would not be in conflict,
since they have different primary keys.

As you said later, a unique constraint on one/some of the other fields
will at least prevent bogus records that are "the same" from my point of
view. I was a bit on the fence about making it unique, since I thought
I might have different hosts that shared the same name. But those
scenarios are unlikely, and I really don't want to deal with it if it
does happen.

But is my whole model of how transactions are operating off? I'm
basically generalizing from Gemstone, an object database.

>
> > Also, I wonder if calling a column 'name' is asking for trouble.
>
> I recommend against it.
I'll change it.
>
> > The actual function is a little more complicated because the table has
> > the possibility of canonical entries with other entries pointing to
> > them, I use 'did' for database id to avoid confusion with various other
> > message ids the system uses; it's for tracking emails.
> >
> > Here's the code. As I said, I'd love comments on any aspect of it.
> >
> > /* gom = get or make functions retrieve did if available, otherwise create record
> > and return its did. */
> >
> > create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
> > OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$
>
> Sometimes it's good to declare your variables with names like
> in_hostname or out_canonicalid, to prevent confusion reading
> similarly-named variables coming from the tables.
I'm slowly discovering the problems of name clashes; thanks for the
suggestion.
>
> > DECLARE
BTW, is DECLARE necessary if there are no declarations?

> > BEGIN
> > select did, canonical into hostid, canonicalid from host
> > where name = hostname;
> > if FOUND then
> > return;
> > end if;
> > if canonicalname is not NULL then
> > select did into canonicalid from host where name = canonicalname;
> > if not FOUND then
> > insert into host (name) values(canonicalname) returning did into canonicalid;
> > end if;
> > end if;
> > if hostname != canonical then
>
> Is canonical a proper variable here? It's not in the argument list, and
> it's not DECLAREd. Did you mean canonicalname?
canonical is a column name in the table. Perhaps canonical_did would be
more appropriate for it (and rename the output parameter
out_canonical_did from canonicalid).
>
> > insert into host (name, canonical) values(hostname, canonicalid)
> > returning did into hostid;
> > else
> > hostid := canonicalid;
>
> I would recommend keeping more explicit track of NULL values and what
> should happen to them. It looks like a NULL canonicalname would fall
> into this else branch (I could be mistaken),
that's right.
> but it's a little hard to
> follow exactly what you want to happen. A few comments would help.
I trimmed a comment before the start of the function; it doesn't exactly
hit that issue, but it does discuss NULL's:
/* Note that if this function creates a canonical host, the canonical field
of that host will be NULL rather than pointing to itself.
This function may return canonicalid = NULL; it does not currently set canonicalid
to hostid in that case.

Also if caller supplies an existing hostname with a bogus canonicalname the
canonicalid on file of that hostname will be returned. */

>
> > return;
> > END
> > $$;
>
> Again, you have a potential race between checking for the canonicalname
> in host, and inserting it into host when it doesn't exist. If another
> transaction runs in between those two, you will get duplicates in the
> host table.
>
> > Table definition:
> > create table host (
> > did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
> > name text, --IP address legal, usually enclosed in []. More commonly an internet domain.
> > -- domain is a sql keyword and so I used host.
> > canonical bigint references host (did) --if not null, preferred name for host
> > --may refer to self
> > ) inherits (RBObject);
> >
> > I'm using Posgresql 8.4.
>
> I recommend a unique constraint on host.name. That will catch the race
> conditions I mentioned above, and turn them into errors. You can catch
> those errors in plpgsql, allowing you to do what you want in that case.
>
> Regards,
> Jeff Davis
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Jeff Davis 2012-06-13 01:17:32 Re: coalesce in plpgsql, and other style questions
Previous Message Richard Terry 2012-06-13 00:33:21 Re: inserting a column into a view