CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+
Date: 2019-10-13 01:26:10
Message-ID: 20191013012610.2p2fp3zzpoav7jzf@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

over in pgsql-bugs [1] we got a report about CREATE TEXT SEARCH
DICTIONARY causing segfaults on 12.0. Simply running

CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

does trigger a crash, 100% of the time. The crash was reported on 12.0,
but it's in fact present since 9.6.

On 9.5 the example does not work, because that version does not (a)
include the hunspell dictionaries used in the example, and (b) it does
not support long flags. So even after copying the dictionaries and
tweaking them a bit it still passes without a crash.

Looking at the commit history of spell.c, there seems to be a bunch of
commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the
code (hunspell), and it also correlates quite nicely with the affected
branches (9.6+). So my best guess is it's a bug in those changes.

A complete backtrace looks like this:

Program received signal SIGSEGV, Segmentation fault.
0x00000000008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f <error: Cannot access memory at address 0x7f7f7f7f7f7f7f7f>) at spell.c:1126
1126 while (*flagcur)
(gdb) bt
#0 0x00000000008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f <error: Cannot access memory at address 0x7f7f7f7f7f7f7f7f>) at spell.c:1126
#1 0x00000000008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608
#2 0x00000000008fe04e in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=3) at spell.c:1680
#3 0x00000000008fe113 in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=2) at spell.c:1692
#4 0x00000000008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=4, level=1) at spell.c:1652
#5 0x00000000008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=9, level=0) at spell.c:1652
#6 0x00000000008fe50b in NISortDictionary (Conf=0x20dd3b8) at spell.c:1785
#7 0x00000000008f9e14 in dispell_init (fcinfo=0x7ffdda6abc90) at dict_ispell.c:89
#8 0x0000000000a6210a in FunctionCall1Coll (flinfo=0x7ffdda6abcf0, collation=0, arg1=34478896) at fmgr.c:1140
#9 0x0000000000a62c72 in OidFunctionCall1Coll (functionId=3731, collation=0, arg1=34478896) at fmgr.c:1418
#10 0x00000000006c2dcb in verify_dictoptions (tmplId=3733, dictoptions=0x20e1b30) at tsearchcmds.c:402
#11 0x00000000006c2f4c in DefineTSDictionary (names=0x20ba278, parameters=0x20ba458) at tsearchcmds.c:463
#12 0x00000000008eb274 in ProcessUtilitySlow (pstate=0x20db518, pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80,
completionTag=0x7ffdda6ac540 "") at utility.c:1272
#13 0x00000000008ea7e5 in standard_ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80,
completionTag=0x7ffdda6ac540 "") at utility.c:927
#14 0x00000000008e991a in ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, completionTag=0x7ffdda6ac540 "")
at utility.c:360
#15 0x00000000008e88e1 in PortalRunUtility (portal=0x2121368, pstmt=0x20bab88, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1175
#16 0x00000000008e8afe in PortalRunMulti (portal=0x2121368, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1321
#17 0x00000000008e8032 in PortalRun (portal=0x2121368, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:796
#18 0x00000000008e1f51 in exec_simple_query (query_string=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);") at postgres.c:1215
#19 0x00000000008e6243 in PostgresMain (argc=1, argv=0x20e54f8, dbname=0x20e5340 "test", username=0x20b53e8 "user") at postgres.c:4236
#20 0x000000000083c5e2 in BackendRun (port=0x20dd980) at postmaster.c:4437
#21 0x000000000083bdb3 in BackendStartup (port=0x20dd980) at postmaster.c:4128
#22 0x00000000008381d7 in ServerLoop () at postmaster.c:1704
#23 0x0000000000837a83 in PostmasterMain (argc=3, argv=0x20b3350) at postmaster.c:1377
#24 0x0000000000759507 in main (argc=3, argv=0x20b3350) at main.c:228
(gdb) up
#1 0x00000000008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608
1608 return (getCompoundAffixFlagValue(Conf, str) & FF_COMPOUNDFLAGMASK);
(gdb) p *Conf
$1 = {maffixes = 16, naffixes = 10, Affix = 0x2181fd0, Suffix = 0x0, Prefix = 0x0, Dictionary = 0x0, AffixData = 0x20e1fa8, lenAffixData = 12, nAffixData = 12, useFlagAliases = true, CompoundAffix = 0x0, usecompound = true, flagMode = FM_LONG, CompoundAffixFlags = 0x217d328, nCompoundAffixFlag = 6,
mCompoundAffixFlag = 10, buildCxt = 0x217cf20, Spell = 0x7bd99b4f6050, nspell = 9, mspell = 20480, firstfree = 0x217f1b8 "", avail = 7608}
(gdb) p affix
$2 = 303

So the affix value is rather strange, because it's clearly outside the
set of flags in Conf (it only has 12 items, so 303 is waaaay too high).

I don't have time to investigate this further and I'm getting lost in
spell.c, so I'm adding Teodor who committed f4ceed6ceba3 in 2016. One
interesting fact is that this is likely due to some discrepancy between
the dictfile and afffile - the segfaulting command appers to mix
hunspell_sample_num and hunspell_sample_long:

CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

But when using the "same" group for both dictfile and afffile, it seems
to work just fine.

[1] https://www.postgresql.org/message-id/flat/16050-024ae722464ab604%40postgresql.org

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-10-13 01:52:21 Re: pgsql: Implement jsonpath .datetime() method
Previous Message Thomas Munro 2019-10-13 00:44:59 Re: stress test for parallel workers