| From: | Ross Boylan <ross(at)biostat(dot)ucsf(dot)edu> | 
|---|---|
| To: | Jeff Davis <pgsql(at)j-davis(dot)com> | 
| Cc: | ross(at)biostat(dot)ucsf(dot)edu, pgsql-novice(at)postgresql(dot)org | 
| Subject: | Re: coalesce in plpgsql, and other style questions | 
| Date: | 2012-06-13 00:42:31 | 
| Message-ID: | 1339548151.5384.157.camel@corn.betterworld.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-novice | 
Thanks for your responses.  Some followup below.
On Tue, 2012-06-12 at 16:30 -0700, Jeff Davis wrote:
> On Tue, 2012-06-12 at 10:46 -0700, Ross Boylan wrote:
> > I just wrote my first pl/pgsql function, and would appreciate any
> > comments people have on it.  I'll be writing a bunch of similar
> > functions, with semantics "give me the id of the object if exists,
> > otherwise create it and give me the id."
> > 
> > My solution seems excessively procedural to me.
> 
> I think what you're trying to do is procedural, and that's not
> necessarily bad. You are trying to get an existing ID if available, or
> assign a new ID if not; and I assume you are doing so to save the space
> associated with storing the full names (essentially like dictionary
> encoding).
> 
> >   I thought I could get
> > the right semantics with something like
> > select coalesce((select id from mytable where name='foo'),
> > 	(insert into mytable ('name') values('foo') returning id))
> > but I could not get that to work in plgsql.
> 
> In pl/pgsql, you can't just use SELECT, you have to use something like
> SELECT INTO the_id ... or use PERFORM.
> 
> Also, you have a potential race condition there, because someone might
> insert "foo" into mytable after the select (first arg to coalesce) and
> before the insert (second arg to coalesce).
Practically, it's just me so there shouldn't be any risk.  But I'd like
to understand the general issue.  I thought transaction would take care
of this, so that within a transaction the state of the database does not
change from actions in other sessions.  Then if I commit and have
conflict, the commit fails.
I guess the sequence I'm using for did assures that did is unique across
all transactions, and so the 2 transactions would not be in conflict,
since they have different primary keys.  
As you said later, a unique constraint on one/some of the other fields
will at least prevent bogus records that are "the same" from my point of
view.  I was a bit on the fence about making it unique, since I thought
I might have different hosts that shared the same name.  But those
scenarios are unlikely, and I really don't want to deal with it if it
does happen.
But is my whole model of how transactions are operating off?  I'm
basically generalizing from Gemstone, an object database.
> 
> > Also, I wonder if calling a column 'name' is asking for trouble.
> 
> I recommend against it.
I'll change it.
> 
> > The actual function is a little more complicated because the table has
> > the possibility of canonical entries with other entries pointing to
> > them,  I use 'did' for database id to avoid confusion with various other
> > message ids the system uses; it's for tracking emails.
> > 
> > Here's the code.  As I said, I'd love comments on any aspect of it.
> > 
> > /* gom = get or make functions retrieve did if available, otherwise create record
> >    and return its did. */
> > 
> > create or replace function gom_hostids(IN hostname text, IN canonicalname text = NULL,
> >        OUT hostid bigint, OUT canonicalid bigint) language plpgsql as $$
> 
> Sometimes it's good to declare your variables with names like
> in_hostname or out_canonicalid, to prevent confusion reading
> similarly-named variables coming from the tables.
I'm slowly discovering the problems of name clashes; thanks for the
suggestion.
> 
> > DECLARE
BTW, is DECLARE necessary if there are no declarations?
> > BEGIN
> >         select did, canonical into hostid, canonicalid from host
> >         where name = hostname;
> >         if FOUND then
> >            return;
> >            end if;
> >         if canonicalname is not NULL then
> >            select did  into canonicalid from host where name = canonicalname;
> >            if not FOUND then
> >               insert into host (name) values(canonicalname) returning did into canonicalid;
> >               end if;
> >            end if;
> >         if hostname != canonical then
> 
> Is canonical a proper variable here? It's not in the argument list, and
> it's not DECLAREd. Did you mean canonicalname?
canonical is a column name in the table.  Perhaps canonical_did would be
more appropriate for it (and rename the output parameter
out_canonical_did from canonicalid).
> 
> >                 insert into host (name, canonical) values(hostname, canonicalid)
> >                        returning did into hostid;
> >         else
> >                 hostid := canonicalid;
> 
> I would recommend keeping more explicit track of NULL values and what
> should happen to them. It looks like a NULL canonicalname would fall
> into this else branch (I could be mistaken), 
that's right.
> but it's a little hard to
> follow exactly what you want to happen. A few comments would help.
I trimmed a comment before the start of the function; it doesn't exactly
hit that issue, but it does discuss NULL's:
/* Note that if this function creates a canonical host, the canonical field
   of that host will be NULL rather than pointing to itself.
   This function may return canonicalid = NULL; it does not currently set canonicalid
   to hostid in that case.
   Also if caller supplies an existing hostname with a bogus canonicalname the
   canonicalid on file of that hostname will be returned. */
> 
> >         return;
> > END
> > $$;
> 
> Again, you have a potential race between checking for the canonicalname
> in host, and inserting it into host when it doesn't exist. If another
> transaction runs in between those two, you will get duplicates in the
> host table.
> 
> > Table definition:
> > create table host (
> >        did bigint primary key default (nextval('rb_id_seq')), ---database id, like oid
> >        name  text, --IP address legal, usually enclosed in [].  More commonly an internet domain.
> >                    -- domain is a sql keyword and so I used host.
> >        canonical bigint references host (did) --if not null, preferred name for host
> >                  --may refer to self
> > ) inherits (RBObject);
> > 
> > I'm using Posgresql 8.4.
> 
> I recommend a unique constraint on host.name. That will catch the race
> conditions I mentioned above, and turn them into errors. You can catch
> those errors in plpgsql, allowing you to do what you want in that case.
> 
> Regards,
> 	Jeff Davis
> 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2012-06-13 01:17:32 | Re: coalesce in plpgsql, and other style questions | 
| Previous Message | Richard Terry | 2012-06-13 00:33:21 | Re: inserting a column into a view |