GIST and TOAST

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: GIST and TOAST
Date: 2007-03-02 16:23:20
Message-ID: 87irdjr4w7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I have a problem getting packed varlenas to work with GIST indexes. Namely,
the GIST code doesn't call pg_detoast_datum() enough. Instead of using the
DatumGetFoo() macros it uses DatumGetPointer() and calls PG_DETOAST_DATUM only
when it thinks it'll be necessary.

I've temporarily made the packed varlena code ignore user defined data types.
This isn't very satisfactory though since it means domains over things like
text don't get packed.

And in any case even with this in place it doesn't fix all the problems. The
Postgres regression tests pass but I can still trigger problems because the
GIST code doesn't reliably call detoast even on user data types they're given.

I think these are actual bugs. If you happened to provide a large enough datum
to the gist code it would cause the same problem I'm seeing. The packed
varlena patch just makes it easier to trigger.

I've fixed the problems I'm seeing with the following patch to _int_gist.c.
But I'm not sure it's correct. What I'm afraid of is that I'm not sure where
these functions are being called from and whether they expect to be leaking
memory. If they're expected to not leak memory then they're now leaking the
detoasted datum and that's a problem.

I'm also wondering if there aren't similar problems in the dozens of other
gist indexing modules...

I wouldn't mind a second pair of eyes on the _int_gist.c changes if there's
someone who can tell me whether any of these functions require a
PG_FREE_IF_COPY or equivalent.

Index: _int_gist.c
===================================================================
RCS file: /home/stark/src/REPOSITORY/pgsql/contrib/intarray/_int_gist.c,v
retrieving revision 1.16
diff -u -r1.16 _int_gist.c
--- _int_gist.c 4 Oct 2006 00:29:45 -0000 1.16
+++ _int_gist.c 2 Mar 2007 16:09:26 -0000
@@ -1,6 +1,6 @@
#include "_int.h"

-#define GETENTRY(vec,pos) ((ArrayType *) DatumGetPointer((vec)->vector[(pos)].key))
+#define GETENTRY(vec,pos) (DatumGetArrayTypeP((vec)->vector[(pos)].key))

/*
** GiST support methods
@@ -39,7 +39,7 @@
if (strategy == BooleanSearchStrategy)
{
retval = execconsistent((QUERYTYPE *) query,
- (ArrayType *) DatumGetPointer(entry->key),
+ DatumGetArrayTypeP(entry->key),
GIST_LEAF(entry));

pfree(query);
@@ -58,7 +58,7 @@
switch (strategy)
{
case RTOverlapStrategyNumber:
- retval = inner_int_overlap((ArrayType *) DatumGetPointer(entry->key),
+ retval = inner_int_overlap(DatumGetArrayTypeP(entry->key),
query);
break;
case RTSameStrategyNumber:
@@ -70,21 +70,21 @@
PointerGetDatum(&retval)
);
else
- retval = inner_int_contains((ArrayType *) DatumGetPointer(entry->key),
+ retval = inner_int_contains(DatumGetArrayTypeP(entry->key),
query);
break;
case RTContainsStrategyNumber:
case RTOldContainsStrategyNumber:
- retval = inner_int_contains((ArrayType *) DatumGetPointer(entry->key),
+ retval = inner_int_contains(DatumGetArrayTypeP(entry->key),
query);
break;
case RTContainedByStrategyNumber:
case RTOldContainedByStrategyNumber:
if (GIST_LEAF(entry))
retval = inner_int_contains(query,
- (ArrayType *) DatumGetPointer(entry->key));
+ DatumGetArrayTypeP(entry->key));
else
- retval = inner_int_overlap((ArrayType *) DatumGetPointer(entry->key),
+ retval = inner_int_overlap(DatumGetArrayTypeP(entry->key),
query);
break;
default:
@@ -282,10 +282,10 @@
float tmp1,
tmp2;

- ud = inner_int_union((ArrayType *) DatumGetPointer(origentry->key),
- (ArrayType *) DatumGetPointer(newentry->key));
+ ud = inner_int_union(DatumGetArrayTypeP(origentry->key),
+ DatumGetArrayTypeP(newentry->key));
rt__int_size(ud, &tmp1);
- rt__int_size((ArrayType *) DatumGetPointer(origentry->key), &tmp2);
+ rt__int_size(DatumGetArrayTypeP(origentry->key), &tmp2);
*result = tmp1 - tmp2;
pfree(ud);

@@ -297,8 +297,8 @@
Datum
g_int_same(PG_FUNCTION_ARGS)
{
- ArrayType *a = (ArrayType *) PointerGetDatum(PG_GETARG_POINTER(0));
- ArrayType *b = (ArrayType *) PointerGetDatum(PG_GETARG_POINTER(1));
+ ArrayType *a = PG_GETARG_ARRAYTYPE_P(0);
+ ArrayType *b = PG_GETARG_ARRAYTYPE_P(1);
bool *result = (bool *) PG_GETARG_POINTER(2);
int4 n = ARRNELEMS(a);
int4 *da,

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2007-03-02 17:04:50 Re: HOT - preliminary results
Previous Message Florian G. Pflug 2007-03-02 16:02:28 Re: UPSERT