Re: How to implement a SP-GiST index as a extension module?

From: Connor Wolf <connorw(at)imaginaryindustries(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to implement a SP-GiST index as a extension module?
Date: 2017-11-06 04:10:38
Message-ID: CAAVqP=rs-y_FnnA-Mh9m_-25U2eDYKLKyLQ94H=YKe6FvbLM-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Never mind, it turns out the issue boiled down to me declaring the
wrong prefixType in my config function.

TL;DR - PEBKAC

On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw(at)imaginaryindustries(dot)com
> wrote:

> Ok, I've got everything compiling and it installs properly, but I'm
> running into problems that I think are either a side-effect of implementing
> picksplit incorrectly (likely), or a bug in SP-GiST(?).
>
> Program received signal SIGSEGV, Segmentation fault.
> __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/
> memcpy-sse2-unaligned.S:159
> 159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file
> or directory.
> (gdb) bt
> #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/
> memcpy-sse2-unaligned.S:159
> #1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
> __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
> #2 memcpyDatum (target=target(at)entry=0x13c9dd8, att=att(at)entry=0x7fff327325f4,
> datum=datum(at)entry=18445692987396472528) at spgutils.c:587
> #3 0x00000000004ee06b in spgFormInnerTuple (state=state(at)entry=0x7fff327325e0,
> hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8,
> nodes=nodes(at)entry=0x13bd340) at spgutils.c:741
> #4 0x00000000004f508b in doPickSplit (index=index(at)entry=0x7f2cf9de7f98,
> state=state(at)entry=0x7fff327325e0, current=current(at)entry=0x7fff32732020,
> parent=parent(at)entry=0x7fff32732040, newLeafTuple=newLeafTuple(at)entry=0x13b9f00,
> level=level(at)entry=0, isNulls=0 '\000', isNew=0 '\000') at
> spgdoinsert.c:913
> #5 0x00000000004f6976 in spgdoinsert (index=index(at)entry=0x7f2cf9de7f98,
> state=state(at)entry=0x7fff327325e0, heapPtr=heapPtr(at)entry=0x12e672c,
> datum=12598555199787281,
> isnull=0 '\000') at spgdoinsert.c:2053
> #6 0x00000000004ee5cc in spgistBuildCallback (index=index(at)entry=0x7f2cf9de7f98,
> htup=htup(at)entry=0x12e6728, values=values(at)entry=0x7fff327321e0,
> isnull=isnull(at)entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive(at)entry=1
> '\001', state=state(at)entry=0x7fff327325e0) at spginsert.c:56
> #7 0x0000000000534e8d in IndexBuildHeapRangeScan
> (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
> indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
> '\001', anyvisible=anyvisible(at)entry=0 '\000', start_blockno=start_blockno@
> entry=0,
> numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
> callback_state=0x7fff327325e0) at index.c:2609
> #8 0x0000000000534f52 in IndexBuildHeapScan (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
> indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
> '\001', callback=callback(at)entry=0x4ee573 <spgistBuildCallback>,
> callback_state=callback_state(at)entry=0x7fff327325e0) at index.c:2182
> #9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
> index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
> #10 0x0000000000535e55 in index_build (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
> indexInfo=indexInfo(at)entry=0x1390ad8, isprimary=isprimary(at)entry=0
> '\000', isreindex=isreindex(at)entry=0 '\000') at index.c:2043
> #11 0x0000000000536ee8 in index_create (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
> indexRelationName=indexRelationName(at)entry=0x12dd600 "int8idx_2",
> indexRelationId=16416, indexRelationId(at)entry=0, relFileNode=0,
> indexInfo=indexInfo(at)entry=0x1390ad8, indexColNames=indexColNames@
> entry=0x1390f40,
> accessMethodObjectId=4000, tableSpaceId=0,
> collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58,
> reloptions=0, isprimary=0 '\000',
> isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
> allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
> is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
> #12 0x00000000005d8fe6 in DefineIndex (relationId=relationId(at)entry=16413,
> stmt=stmt(at)entry=0x12dd568, indexRelationId=indexRelationId(at)entry=0,
> is_alter_table=is_alter_table(at)entry=0 '\000',
> check_rights=check_rights(at)entry=1 '\001', check_not_in_use=check_not_in_
> use(at)entry=1 '\001', skip_build=0 '\000',
> quiet=0 '\000') at indexcmds.c:667
> #13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate(at)entry=0x12dd450,
> pstmt=pstmt(at)entry=0x12db108,
> queryString=queryString(at)entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
> int8tmp_2 USING spgist ( a vptree_ops );", context=context(at)entry=PROCESS_
> UTILITY_TOPLEVEL,
> params=params(at)entry=0x0, queryEnv=queryEnv(at)entry=0x0, dest=0x12db200,
> completionTag=0x7fff32732ed0 "") at utility.c:1326
> #14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
> queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
> vptree_ops );",
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
> #15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt(at)entry=0x12db108,
> queryString=<optimized out>, context=context(at)entry=PROCESS_
> UTILITY_TOPLEVEL,
> params=<optimized out>, queryEnv=<optimized out>, dest=dest(at)entry=0x12db200,
> completionTag=0x7fff32732ed0 "") at utility.c:357
> #16 0x000000000077de2e in PortalRunUtility (portal=portal(at)entry=0x1391a80,
> pstmt=pstmt(at)entry=0x12db108, isTopLevel=isTopLevel(at)entry=1 '\001',
> setHoldSnapshot=setHoldSnapshot(at)entry=0 '\000', dest=dest(at)entry=0x12db200,
> completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1178
> #17 0x000000000077e98e in PortalRunMulti (portal=portal(at)entry=0x1391a80,
> isTopLevel=isTopLevel(at)entry=1 '\001', setHoldSnapshot=
> setHoldSnapshot(at)entry=0 '\000',
> dest=dest(at)entry=0x12db200, altdest=altdest(at)entry=0x12db200,
> completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1324
> #18 0x000000000077f782 in PortalRun (portal=portal(at)entry=0x1391a80,
> count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1
> '\001',
> run_once=run_once(at)entry=1 '\001', dest=dest(at)entry=0x12db200,
> altdest=altdest(at)entry=0x12db200, completionTag=0x7fff32732ed0 "") at
> pquery.c:799
> #19 0x000000000077bc12 in exec_simple_query (query_string=query_string(at)entry=0x12da0a0
> "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
> at postgres.c:1120
> #20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
> argv=argv(at)entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
> username=<optimized out>)
> at postgres.c:4139
> #21 0x00000000006fecf4 in BackendRun (port=port(at)entry=0x12de030) at
> postmaster.c:4364
> #22 0x0000000000700e32 in BackendStartup (port=port(at)entry=0x12de030) at
> postmaster.c:4036
> #23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
> #24 0x00000000007023af in PostmasterMain (argc=argc(at)entry=8,
> argv=argv(at)entry=0x12ba7c0) at postmaster.c:1363
> #25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228
>
>
>
> It's segfaulting when trying to build the inner tuple after the picksplit
> operation.
>
> Adding debugging output to the print function, I see:
>
> NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16
>
> The first item in my input data file is zero, and if I change it to 1:
>
> NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16
>
> So pretty clearly, I'm trying to copy from the literal data representation
> of the data as an address.
> Following the data, this is the value I'm assigning to out->prefixDatum in
> my picksplit call. I can confirm this by hard-coding the
> value of out->prefixDatum in my picksplit call to a known value, it shows
> up as the address in the memcopy call.
>
> However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
> = Int64GetDatum(val);
>
> This is similar to how the other spgist implementations work.
> spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
> for example.
>
> I think this is the SP-GiST core failing to handle certain types being
> pass-by-value? I'm not totally certain.
>
> As I understand it, the "maybe-pass-by-reference" parameter is a global
> flag (USE_FLOAT8_BYVAL), but I'd like to
> keep that enabled. What's the proper approach for adding support for this
> in the SP-GiST core?
>
> My (somewhat messy) extension module is here
> <https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
> it's relevant.
>
> Connor
>
>
>
>
> On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
>> connorw(at)imaginaryindustries(dot)com> wrote:
>>
>>> EDIT: That's actually exactly how the example I'm working off of works.
>>> DERP. The SQL is
>>>
>>> CREATE TYPE vptree_area AS
>>> (
>>> center _int4,
>>> distance float8
>>> );
>>>
>>> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
>>> boolean AS
>>> 'MODULE_PATHNAME','vptree_area_match'
>>> LANGUAGE C IMMUTABLE STRICT;
>>>
>>> CREATE OPERATOR <@ (
>>> LEFTARG = _int4,
>>> RIGHTARG = vptree_area,
>>> PROCEDURE = vptree_area_match,
>>> RESTRICT = contsel,
>>> JOIN = contjoinsel);
>>>
>>> so I just need to understand how to parse out the custom type in my
>>> index operator.
>>>
>>
>> You can see the implementation of vptree_area_match function located in
>> vptree.c. It just calls GetAttributeByNum() function.
>>
>> There is also alternative approach for that implemented in pg_trgm
>> contrib module. It has "text % text" operator which checks if two strings
>> are similar enough. The similarity threshold is defined by
>> pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
>> threshold distance value. However, it would place some limitations. For
>> instance, you wouldn't be able to use different distance threshold in the
>> same query.
>>
>> ------
>> Alexander Korotkov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-11-06 04:15:30 Re: path toward faster partition pruning
Previous Message David Rowley 2017-11-06 03:53:42 Re: path toward faster partition pruning