Re: How to implement a SP-GiST index as a extension module?

From: Connor Wolf <connorw(at)imaginaryindustries(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to implement a SP-GiST index as a extension module?
Date: 2017-11-13 03:47:24
Message-ID: CAAVqP=qMg4bVU9f-EaShcwsMMpHYeQmP4LBzB86HsfOQJ9Xxpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, I've managed to get my custom index working.

It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if
anyone else needs a fuzzy-image searching system
that can integrate into postgresql..

It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.

Thanks!

On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <connorw(at)imaginaryindustries(dot)com
> wrote:

> Never mind, it turns out the issue boiled down to me declaring the
> wrong prefixType in my config function.
>
> TL;DR - PEBKAC
>
> On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw(at)imaginaryindustries(dot)
> com> wrote:
>
>> Ok, I've got everything compiling and it installs properly, but I'm
>> running into problems that I think are either a side-effect of implementing
>> picksplit incorrectly (likely), or a bug in SP-GiST(?).
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
>> mcpy-sse2-unaligned.S:159
>> 159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such
>> file or directory.
>> (gdb) bt
>> #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
>> mcpy-sse2-unaligned.S:159
>> #1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
>> __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
>> #2 memcpyDatum (target=target(at)entry=0x13c9dd8, att=att(at)entry=0x7fff327325f4,
>> datum=datum(at)entry=18445692987396472528) at spgutils.c:587
>> #3 0x00000000004ee06b in spgFormInnerTuple (state=state(at)entry
>> =0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528,
>> nNodes=8,
>> nodes=nodes(at)entry=0x13bd340) at spgutils.c:741
>> #4 0x00000000004f508b in doPickSplit (index=index(at)entry=0x7f2cf9de7f98,
>> state=state(at)entry=0x7fff327325e0, current=current(at)entry=0x7fff32732020,
>> parent=parent(at)entry=0x7fff32732040, newLeafTuple=newLeafTuple(at)entry=0x13b9f00,
>> level=level(at)entry=0, isNulls=0 '\000', isNew=0 '\000') at
>> spgdoinsert.c:913
>> #5 0x00000000004f6976 in spgdoinsert (index=index(at)entry=0x7f2cf9de7f98,
>> state=state(at)entry=0x7fff327325e0, heapPtr=heapPtr(at)entry=0x12e672c,
>> datum=12598555199787281,
>> isnull=0 '\000') at spgdoinsert.c:2053
>> #6 0x00000000004ee5cc in spgistBuildCallback (index=index(at)entry
>> =0x7f2cf9de7f98, htup=htup(at)entry=0x12e6728, values=values(at)entry
>> =0x7fff327321e0,
>> isnull=isnull(at)entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive(at)entry=1
>> '\001', state=state(at)entry=0x7fff327325e0) at spginsert.c:56
>> #7 0x0000000000534e8d in IndexBuildHeapRangeScan
>> (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
>> indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
>> '\001', anyvisible=anyvisible(at)entry=0 '\000',
>> start_blockno=start_blockno(at)entry=0,
>> numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
>> callback_state=0x7fff327325e0) at index.c:2609
>> #8 0x0000000000534f52 in IndexBuildHeapScan
>> (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
>> indexInfo=indexInfo(at)entry=0x1390ad8, allow_sync=allow_sync(at)entry=1
>> '\001', callback=callback(at)entry=0x4ee573 <spgistBuildCallback>,
>> callback_state=callback_state(at)entry=0x7fff327325e0) at index.c:2182
>> #9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
>> index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
>> #10 0x0000000000535e55 in index_build (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation(at)entry=0x7f2cf9de7f98,
>> indexInfo=indexInfo(at)entry=0x1390ad8, isprimary=isprimary(at)entry=0
>> '\000', isreindex=isreindex(at)entry=0 '\000') at index.c:2043
>> #11 0x0000000000536ee8 in index_create (heapRelation=heapRelation(at)entry=0x7f2cf9ddc6c8,
>> indexRelationName=indexRelationName(at)entry=0x12dd600 "int8idx_2",
>> indexRelationId=16416, indexRelationId(at)entry=0, relFileNode=0,
>> indexInfo=indexInfo(at)entry=0x1390ad8, indexColNames=indexColNames(at)en
>> try=0x1390f40,
>> accessMethodObjectId=4000, tableSpaceId=0,
>> collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58,
>> reloptions=0, isprimary=0 '\000',
>> isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
>> allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
>> is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
>> #12 0x00000000005d8fe6 in DefineIndex (relationId=relationId(at)entry=16413,
>> stmt=stmt(at)entry=0x12dd568, indexRelationId=indexRelationId(at)entry=0,
>> is_alter_table=is_alter_table(at)entry=0 '\000',
>> check_rights=check_rights(at)entry=1 '\001', check_not_in_use=check_not_in_
>> use(at)entry=1 '\001', skip_build=0 '\000',
>> quiet=0 '\000') at indexcmds.c:667
>> #13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate(at)entry
>> =0x12dd450, pstmt=pstmt(at)entry=0x12db108,
>> queryString=queryString(at)entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
>> int8tmp_2 USING spgist ( a vptree_ops );", context=context(at)entry=PROCESS_
>> UTILITY_TOPLEVEL,
>> params=params(at)entry=0x0, queryEnv=queryEnv(at)entry=0x0,
>> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326
>> #14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
>> queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
>> vptree_ops );",
>> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
>> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
>> #15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt(at)entry=0x12db108,
>> queryString=<optimized out>, context=context(at)entry=PROCESS_
>> UTILITY_TOPLEVEL,
>> params=<optimized out>, queryEnv=<optimized out>, dest=dest(at)entry=0x12db200,
>> completionTag=0x7fff32732ed0 "") at utility.c:357
>> #16 0x000000000077de2e in PortalRunUtility (portal=portal(at)entry=0x1391a80,
>> pstmt=pstmt(at)entry=0x12db108, isTopLevel=isTopLevel(at)entry=1 '\001',
>> setHoldSnapshot=setHoldSnapshot(at)entry=0 '\000', dest=dest(at)entry=0x12db200,
>> completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1178
>> #17 0x000000000077e98e in PortalRunMulti (portal=portal(at)entry=0x1391a80,
>> isTopLevel=isTopLevel(at)entry=1 '\001', setHoldSnapshot=setHoldSnapsho
>> t(at)entry=0 '\000',
>> dest=dest(at)entry=0x12db200, altdest=altdest(at)entry=0x12db200,
>> completionTag=completionTag(at)entry=0x7fff32732ed0 "") at pquery.c:1324
>> #18 0x000000000077f782 in PortalRun (portal=portal(at)entry=0x1391a80,
>> count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1
>> '\001',
>> run_once=run_once(at)entry=1 '\001', dest=dest(at)entry=0x12db200,
>> altdest=altdest(at)entry=0x12db200, completionTag=0x7fff32732ed0 "") at
>> pquery.c:799
>> #19 0x000000000077bc12 in exec_simple_query (query_string=query_string(at)entry=0x12da0a0
>> "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
>> at postgres.c:1120
>> #20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
>> argv=argv(at)entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
>> username=<optimized out>)
>> at postgres.c:4139
>> #21 0x00000000006fecf4 in BackendRun (port=port(at)entry=0x12de030) at
>> postmaster.c:4364
>> #22 0x0000000000700e32 in BackendStartup (port=port(at)entry=0x12de030) at
>> postmaster.c:4036
>> #23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
>> #24 0x00000000007023af in PostmasterMain (argc=argc(at)entry=8,
>> argv=argv(at)entry=0x12ba7c0) at postmaster.c:1363
>> #25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228
>>
>>
>>
>> It's segfaulting when trying to build the inner tuple after the picksplit
>> operation.
>>
>> Adding debugging output to the print function, I see:
>>
>> NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16
>>
>> The first item in my input data file is zero, and if I change it to 1:
>>
>> NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16
>>
>> So pretty clearly, I'm trying to copy from the literal data
>> representation of the data as an address.
>> Following the data, this is the value I'm assigning to out->prefixDatum in
>> my picksplit call. I can confirm this by hard-coding the
>> value of out->prefixDatum in my picksplit call to a known value, it
>> shows up as the address in the memcopy call.
>>
>> However, as far as I can tell, I'm assigning it correctly: out->prefixDatum
>> = Int64GetDatum(val);
>>
>> This is similar to how the other spgist implementations work.
>> spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
>> for example.
>>
>> I think this is the SP-GiST core failing to handle certain types being
>> pass-by-value? I'm not totally certain.
>>
>> As I understand it, the "maybe-pass-by-reference" parameter is a global
>> flag (USE_FLOAT8_BYVAL), but I'd like to
>> keep that enabled. What's the proper approach for adding support for this
>> in the SP-GiST core?
>>
>> My (somewhat messy) extension module is here
>> <https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
>> it's relevant.
>>
>> Connor
>>
>>
>>
>>
>> On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
>> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>>
>>> On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
>>> connorw(at)imaginaryindustries(dot)com> wrote:
>>>
>>>> EDIT: That's actually exactly how the example I'm working off of works.
>>>> DERP. The SQL is
>>>>
>>>> CREATE TYPE vptree_area AS
>>>> (
>>>> center _int4,
>>>> distance float8
>>>> );
>>>>
>>>> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area)
>>>> RETURNS boolean AS
>>>> 'MODULE_PATHNAME','vptree_area_match'
>>>> LANGUAGE C IMMUTABLE STRICT;
>>>>
>>>> CREATE OPERATOR <@ (
>>>> LEFTARG = _int4,
>>>> RIGHTARG = vptree_area,
>>>> PROCEDURE = vptree_area_match,
>>>> RESTRICT = contsel,
>>>> JOIN = contjoinsel);
>>>>
>>>> so I just need to understand how to parse out the custom type in my
>>>> index operator.
>>>>
>>>
>>> You can see the implementation of vptree_area_match function located in
>>> vptree.c. It just calls GetAttributeByNum() function.
>>>
>>> There is also alternative approach for that implemented in pg_trgm
>>> contrib module. It has "text % text" operator which checks if two strings
>>> are similar enough. The similarity threshold is defined by
>>> pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with
>>> threshold distance value. However, it would place some limitations. For
>>> instance, you wouldn't be able to use different distance threshold in the
>>> same query.
>>>
>>> ------
>>> Alexander Korotkov
>>> Postgres Professional: http://www.postgrespro.com
>>> The Russian Postgres Company
>>>
>>>
>>
>>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-11-13 04:42:57 Re: Proposal: Improve bitmap costing for lossy pages
Previous Message Etsuro Fujita 2017-11-13 02:11:27 Re: Incorrect comment for build_child_join_rel