Re: Text <-> C string

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 16:34:43
Message-ID: 23752.1206462883@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I've been working some more on Brendan Jurd's patch to simplify text <->
C string conversions. It seems we have consensus on the names for the
base operations:

extern text *cstring_to_text(const char *s);
extern char *text_to_cstring(const text *t);

Brendan's patch also included "cstring_text_limit(const char *s, int len)"
which was defined as copying Min(len, strlen(s)) bytes. I didn't find
this to be particularly useful. In the first place, all potential
callers are passing the exact desired length, so the strlen() call is
just a waste of cycles. In the second place, at least some callers pass
text that is not embedded in a known-to-be-null-terminated string (it
could be a section of a text datum instead); which means there is a
nonzero chance of the strlen running off the end of memory and dumping
core. So I propose instead

extern text *cstring_to_text_with_len(const char *s, int len);

which just takes the given length as gospel. Brendan had also proposed
"text_to_cstring_limit(const text *t, int len)" with similar Min()
semantics, but what this was doing was replacing copies into
limited-size local buffers with a palloc. If we did that we might
as well just use text_to_cstring. What I think is more useful is
a strlcpy()-like function that copies into a caller-supplied buffer
of limited size. For lack of a better idea I propose defining it
*exactly* like strlcpy:

extern size_t textlcpy(char *dst, const text *src, size_t siz);

I've also found that there are lots and lots of places where the
text end of the conversion needs to be a Datum not a text *,
so it seems worthwhile to introduce a couple of macros to minimize
notation in that case:

#define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
#define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))

Lastly, the originally submitted text-to-something functions would
work correctly on plain and 1-byte-header datums, but not on
compressed or toasted-out-of-line datums. There are a whole lot of
places where that's not good enough. Rather than expecting the caller
to use the right detoasting macro everywhere, it seems best to make
these functions cope with any variant. That also avoids memory
leakage by allowing the intermediate copy to be pfree'd. (I had
suggested that the pfree might be pointless, but I reconsidered ---
if the text object is large enough to be compressed or toasted,
we're talking about at least several K, so it's worth not leaking.)

In short, the infrastructure I'm currently testing is the above
definitions with the attached implementation. Last call for
objections ...

regards, tom lane

/*
* cstring_to_text
*
* Create a text value from a null-terminated C string.
*
* The new text value is freshly palloc'd with a full-size VARHDR.
*/
text *
cstring_to_text(const char *s)
{
return cstring_to_text_with_len(s, strlen(s));
}

/*
* cstring_to_text_with_len
*
* Same as cstring_to_text except the caller specifies the string length;
* the string need not be null_terminated.
*/
text *
cstring_to_text_with_len(const char *s, int len)
{
text *result = (text *) palloc(len + VARHDRSZ);

SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), s, len);

return result;
}

/*
* text_to_cstring
*
* Create a palloc'd, null-terminated C string from a text value.
*
* We support being passed a compressed or toasted text value.
* This is a bit bogus since such values shouldn't really be referred to as
* "text *", but it seems useful for robustness. If we didn't handle that
* case here, we'd need another routine that did, anyway.
*/
char *
text_to_cstring(const text *t)
{
char *result;
text *tunpacked = pg_detoast_datum_packed((struct varlena *) t);
int len = VARSIZE_ANY_EXHDR(tunpacked);

result = (char *) palloc(len + 1);
memcpy(result, VARDATA_ANY(tunpacked), len);
result[len] = '\0';

if (tunpacked != t)
pfree(tunpacked);

return result;
}

/*
* textlcpy --- exactly like strlcpy(), except source is a text value.
*
* Copy src to string dst of size siz. At most siz-1 characters
* will be copied. Always NUL terminates (unless siz == 0).
* Returns strlen(src); if retval >= siz, truncation occurred.
*
* We support being passed a compressed or toasted text value.
* This is a bit bogus since such values shouldn't really be referred to as
* "text *", but it seems useful for robustness. If we didn't handle that
* case here, we'd need another routine that did, anyway.
*/
size_t
textlcpy(char *dst, const text *src, size_t siz)
{
text *srcunpacked = pg_detoast_datum_packed((struct varlena *) src);
size_t srclen = VARSIZE_ANY_EXHDR(srcunpacked);

if (siz > 0)
{
siz--;
if (siz >= srclen)
siz = srclen;
else /* ensure truncation is encoding-safe */
siz = pg_mbcliplen(VARDATA_ANY(srcunpacked), srclen, siz);
memcpy(dst, VARDATA_ANY(srcunpacked), siz);
dst[siz] = '\0';
}

if (srcunpacked != src)
pfree(srcunpacked);

return srclen;
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zoltan Boszormenyi 2008-03-25 16:40:49 Re: TRUNCATE TABLE with IDENTITY
Previous Message Sam Mason 2008-03-25 16:33:03 Re: writing a MIN(RECORD) aggregate

Browse pgsql-patches by date

  From Date Subject
Next Message Zoltan Boszormenyi 2008-03-25 16:40:49 Re: TRUNCATE TABLE with IDENTITY
Previous Message Simon Riggs 2008-03-25 16:18:50 Re: TRUNCATE TABLE with IDENTITY