Re: How to implement a SP-GiST index as a extension module?

From: Connor Wolf <connorw(at)imaginaryindustries(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to implement a SP-GiST index as a extension module?
Date: 2017-11-05 08:09:51
Message-ID: CAAVqP=q59_KFe9eL2gxLvh581mNQh=D98Thr9+=utGCK7Rg=7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, I've got everything compiling and it installs properly, but I'm running
into problems that I think are either a side-effect of implementing
picksplit incorrectly (likely), or a bug in SP-GiST(?).

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file
or directory.
(gdb) bt
#0 __memcpy_sse2_unaligned () at
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
#1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
__dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 memcpyDatum (target=target(at)entry=0x13c9dd8, att=att(at)entry=0x7fff327325f4,
datum=datum(at)entry=18445692987396472528) at spgutils.c:587
#3 0x00000000004ee06b in spgFormInnerTuple (state=state(at)entry=0x7fff327325e0,
hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8,
nodes=nodes(at)entry=0x13bd340) at spgutils.c:741
#4 0x00000000004f508b in doPickSplit (index=index(at)entry=0x7f2cf9de7f98,
state=state(at)entry=0x7fff327325e0, current=current(at)entry=0x7fff32732020,
parent=parent(at)entry=0x7fff32732040,
newLeafTuple=newLeafTuple(at)entry=0x13b9f00,
level=level(at)entry=0, isNulls=0 '\000', isNew=0 '\000') at spgdoinsert.c:913
#5 0x00000000004f6976 in spgdoinsert (index=index(at)entry=0x7f2cf9de7f98,
state=state(at)entry=0x7fff327325e0, heapPtr=heapPtr(at)entry=0x12e672c,
datum=12598555199787281,
isnull=0 '\000') at spgdoinsert.c:2053
#6 0x00000000004ee5cc in spgistBuildCallback
(index=index(at)entry=0x7f2cf9de7f98,
htup=htup(at)entry=0x12e6728, values=values(at)entry=0x7fff327321e0,
isnull=isnull(at)entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive(at)entry=1
'\001', state=state(at)entry=0x7fff327325e0) at spginsert.c:56
#7 0x0000000000534e8d in IndexBuildHeapRangeScan
(heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
'\001', anyvisible=anyvisible(at)entry=0 '\000',
start_blockno=start_blockno(at)entry=0,
numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
callback_state=0x7fff327325e0) at index.c:2609
#8 0x0000000000534f52 in IndexBuildHeapScan
(heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
'\001', callback=callback(at)entry=0x4ee573 <spgistBuildCallback>,
callback_state=callback_state(at)entry=0x7fff327325e0) at index.c:2182
#9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
#10 0x0000000000535e55 in index_build
(heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
indexInfo=indexInfo(at)entry=0x1390ad8, isprimary=isprimary(at)entry=0
'\000', isreindex=isreindex(at)entry=0 '\000') at index.c:2043
#11 0x0000000000536ee8 in index_create
(heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
indexRelationName=indexRelationName(at)entry=0x12dd600 "int8idx_2",
indexRelationId=16416, indexRelationId(at)entry=0, relFileNode=0,
indexInfo=indexInfo(at)entry=0x1390ad8, indexColNames=indexColNames(at)entry
=0x1390f40,
accessMethodObjectId=4000, tableSpaceId=0, collationObjectId=0x12e6b18,
classObjectId=0x12e6b38, coloptions=0x12e6b58, reloptions=0, isprimary=0
'\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId(at)entry=16413,
stmt=stmt(at)entry=0x12dd568, indexRelationId=indexRelationId(at)entry=0,
is_alter_table=is_alter_table(at)entry=0 '\000',
check_rights=check_rights(at)entry=1 '\001',
check_not_in_use=check_not_in_use(at)entry=1 '\001', skip_build=0 '\000',
quiet=0 '\000') at indexcmds.c:667
#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate(at)entry=0x12dd450,
pstmt=pstmt(at)entry=0x12db108,
queryString=queryString(at)entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
int8tmp_2 USING spgist ( a vptree_ops );", context=context(at)entry
=PROCESS_UTILITY_TOPLEVEL,
params=params(at)entry=0x0, queryEnv=queryEnv(at)entry=0x0, dest=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:1326
#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
vptree_ops );",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
#15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt(at)entry=0x12db108,
queryString=<optimized out>, context=context(at)entry=PROCESS_UTILITY_TOPLEVEL,
params=<optimized out>, queryEnv=<optimized out>,
dest=dest(at)entry=0x12db200,
completionTag=0x7fff32732ed0 "") at utility.c:357
#16 0x000000000077de2e in PortalRunUtility (portal=portal(at)entry=0x1391a80,
pstmt=pstmt(at)entry=0x12db108, isTopLevel=isTopLevel(at)entry=1 '\001',
setHoldSnapshot=setHoldSnapshot(at)entry=0 '\000', dest=dest(at)entry=0x12db200,
completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1178
#17 0x000000000077e98e in PortalRunMulti (portal=portal(at)entry=0x1391a80,
isTopLevel=isTopLevel(at)entry=1 '\001', setHoldSnapshot=setHoldSnapshot(at)entry=0
'\000',
dest=dest(at)entry=0x12db200, altdest=altdest(at)entry=0x12db200,
completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1324
#18 0x000000000077f782 in PortalRun (portal=portal(at)entry=0x1391a80,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1 '\001',
run_once=run_once(at)entry=1 '\001', dest=dest(at)entry=0x12db200,
altdest=altdest(at)entry=0x12db200, completionTag=0x7fff32732ed0 "") at
pquery.c:799
#19 0x000000000077bc12 in exec_simple_query
(query_string=query_string(at)entry=0x12da0a0
"CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
at postgres.c:1120
#20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
username=<optimized out>)
at postgres.c:4139
#21 0x00000000006fecf4 in BackendRun (port=port(at)entry=0x12de030) at
postmaster.c:4364
#22 0x0000000000700e32 in BackendStartup (port=port(at)entry=0x12de030) at
postmaster.c:4036
#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
#24 0x00000000007023af in PostmasterMain (argc=argc(at)entry=8,
argv=argv(at)entry=0x12ba7c0)
at postmaster.c:1363
#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228

It's segfaulting when trying to build the inner tuple after the picksplit
operation.

Adding debugging output to the print function, I see:

NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16

The first item in my input data file is zero, and if I change it to 1:

NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16

So pretty clearly, I'm trying to copy from the literal data representation
of the data as an address.
Following the data, this is the value I'm assigning to out->prefixDatum in
my picksplit call. I can confirm this by hard-coding the
value of out->prefixDatum in my picksplit call to a known value, it shows
up as the address in the memcopy call.

However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
= Int64GetDatum(val);

This is similar to how the other spgist implementations work.
spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
for example.

I think this is the SP-GiST core failing to handle certain types being
pass-by-value? I'm not totally certain.

As I understand it, the "maybe-pass-by-reference" parameter is a global
flag (USE_FLOAT8_BYVAL), but I'd like to
keep that enabled. What's the proper approach for adding support for this
in the SP-GiST core?

My (somewhat messy) extension module is here
<https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
it's relevant.

Connor

On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <connorw(at)imaginaryindustries(dot)
> com> wrote:
>
>> EDIT: That's actually exactly how the example I'm working off of works.
>> DERP. The SQL is
>>
>> CREATE TYPE vptree_area AS
>> (
>> center _int4,
>> distance float8
>> );
>>
>> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
>> boolean AS
>> 'MODULE_PATHNAME','vptree_area_match'
>> LANGUAGE C IMMUTABLE STRICT;
>>
>> CREATE OPERATOR <@ (
>> LEFTARG = _int4,
>> RIGHTARG = vptree_area,
>> PROCEDURE = vptree_area_match,
>> RESTRICT = contsel,
>> JOIN = contjoinsel);
>>
>> so I just need to understand how to parse out the custom type in my index
>> operator.
>>
>
> You can see the implementation of vptree_area_match function located in
> vptree.c. It just calls GetAttributeByNum() function.
>
> There is also alternative approach for that implemented in pg_trgm contrib
> module. It has "text % text" operator which checks if two strings are
> similar enough. The similarity threshold is defined by
> pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
> threshold distance value. However, it would place some limitations. For
> instance, you wouldn't be able to use different distance threshold in the
> same query.
>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lucas 2017-11-05 10:17:47 Early locking option to parallel backup
Previous Message Amit Kapila 2017-11-05 05:02:50 Re: Parallel Plans and Cost of non-filter functions