Re: I s this a bug of spgist index in a heavy write condition?

From: 李海龙 <hailong(dot)li(at)qunar(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: 何伟平 <weiping(dot)he(at)qunar(dot)com>, 王冬 <dong(dot)wang(at)qunar(dot)com>, 于超超 <chaochao(dot)yu(at)qunar(dot)com>
Subject: Re: I s this a bug of spgist index in a heavy write condition?
Date: 2013-01-08 10:48:50
Message-ID: 50EBF992.2000704@qunar.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,dear tom lane && pgsql-hackers

I am very excited to say that I may have created a test case!

OS Version:
CentOS release 5.5 (Final)

PostgreSQL Version:
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
(1 row)

I create a database named test_spgist,and there is only one table named route_raw which has a spgist index column in the database.

Some details is shown as following:

test_spgist=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | route_raw | table | postgres
(1 row)

test_spgist=# \d route_raw
Table "public.route_raw"
Column | Type | Modifiers
----------------------+-----------------------------+---------------------------------------------
sourceurl | character varying(1000) | not null default ''::character varying
title | character varying(200) | default NULL::character varying
price | numeric(10,0) | default NULL::numeric
routesnapshot | character varying(4024) | default NULL::character varying
routesnapshot_small | character varying(800) | default NULL::character varying
routesnapshot_mid | character varying(800) | default NULL::character varying
feature | text |
departure | character varying(50) | default NULL::character varying
arrive | character varying(500) | default NULL::character varying
function | character varying(50) | default NULL::character varying
type | character varying(50) | default NULL::character varying
subject | text |
dateofdeparture | timestamp without time zone |
dateofexpire | timestamp without time zone |
dateofbookingexpire | timestamp without time zone |
itineraryday | integer |
sightspot | text |
traffic | character varying(100) | default NULL::character varying
stargrade | double precision |
region | character varying(50) | default NULL::character varying
subregion | character varying(50) | default NULL::character varying
currency | character varying(45) | default NULL::character varying
listinfo | character varying(1024) | default NULL::character varying
dateofload | timestamp without time zone |
urlkey | character varying(64) | not null default ''::character varying
sightspot_b | text |
sightspot_c | text |
wrapperid | character varying(128) | default NULL::character varying
wrappername | character varying(50) | default NULL::character varying
toflight | character varying(45) | default NULL::character varying
backflight | character varying(45) | default NULL::character varying
raw_date | character varying(2048) | default NULL::character varying
feeinfo | text |
others | text |
feeinclude | character varying(1000) | default NULL::character varying
feeexclude | character varying(1000) | default NULL::character varying
ownexpense | character varying(1000) | default NULL::character varying
bookingterms | character varying(1000) | default NULL::character varying
visainfos | character varying(1000) | default NULL::character varying
shopinfos | character varying(500) | default NULL::character varying
flightinfos | character varying(500) | default NULL::character varying
spotofdeparture | character varying(20) | default NULL::character varying
spotofback | character varying(20) | default NULL::character varying
tips | character varying(1000) | default NULL::character varying
contact | character varying(100) | default NULL::character varying
raw_price | character varying(100) | default NULL::character varying
raw_bookexpiredate | character varying(100) | default NULL::character varying
transfer | character varying(20) | default NULL::character varying
contractstyle | character varying(500) | default NULL::character varying
citiesofbook | character varying(1000) | default NULL::character varying
ishotelselected | character varying(20) | not null default 'false'::character varying
freetripitineraryday | integer | not null default (-1)
suitcrowdsubject | character varying(50) | default NULL::character varying
newsubject | character varying(50) | default NULL::character varying
isflightselected | character varying(20) | not null default 'false'::character varying
festival | character varying(512) | default NULL::character varying
promotion | character varying(1024) |
Indexes:
"route_raw_1_idx" btree (price, departure, dateofdeparture, itineraryday)
"route_raw_sourceurl_idx1" spgist (sourceurl)

test_spgist=# select sourceurl from route_raw limit 5;
sourceurl
---------------------------------------------------------------------------------------------------------------
http://suz.bacts.com/group/group_33_128972.aspx
http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623
http://www.ctszj.com.cn/route_group_1_147707.html
http://jx.bacts.com/group/group_33_118381.aspx
http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096
(5 rows)

--Before the test, this table has 997736 records.
test_spgist=# select count(1) from route_raw ;
count
--------
997736
(1 row)

My some deteails the test procedure is as follows.

1. edit a shell script that function is insert data to table route_raw

[hailong(dot)li(at)pgdb2 ~]$ cat /tmp/insert_spgist.sh
#/bin/bash

export PGPASSWORD="6b4df779-dd13-4031-a00b-126b49287995"
PSQL="/opt/pg92/bin/psql"
DBHOST=l-pgdb2.s.cn6
DB=test_spgist
USR=postgres

while [ 10 -ne 9 ]
do
SQL="insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/route_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_134782.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')"
sleep 1
$PSQL -h $DBHOST -U $USR -d $DB -c "$SQL"
done

2. I create 15 clients and execute this shell script in each client

[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ /tmp/insert_spgist.sh >/dev/null
...
...
...
[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ /tmp/insert_spgist.sh >/dev/null

3. I execute the statement "select * from pg_stat_activity; and select count(1) from route_raw; " to monitor the connections

test_spgist=# select * from pg_stat_activity ;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
query_start | state_change | waiting | state |
;
query
;
;
-------+-------------+-------+----------+----------+------------------+-----------------+-----------------+-------------+-------------------------------+-------------------------------+------
-------------------------+-------------------------------+---------+--------+------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16384 | test_spgist | 14959 | 10 | postgres | psql | | | -1 | 2013-01-08 16:43:53.175203+08 | 2013-01-08 17:16:01.997493+08 | 2013-
01-08 17:16:01.997493+08 | 2013-01-08 17:16:01.997495+08 | f | active | select * from pg_stat_activity ;

16384 | test_spgist | 16858 | 10 | postgres | psql | | | -1 | 2013-01-08 17:04:15.000472+08 | | 2013-
01-08 17:15:01.303197+08 | 2013-01-08 17:15:02.372496+08 | f | idle | select count(1) from route_raw ;

16384 | test_spgist | 26097 | 10 | postgres | psql | 192.168.232.122 | | 54637 | 2013-01-08 17:12:16.228789+08 | 2013-01-08 17:12:16.231496+08 | 2013-
01-08 17:12:16.231496+08 | 2013-01-08 17:12:16.231504+08 | f | active | insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn
/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, ht
tp://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/
route_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_1
34782.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')
......
......
(17 rows)

test_spgist=# select count(1) from route_raw ;
count
--------
997799
(1 row)

test_spgist=# select count(1) from route_raw ;
count
--------
997801
(1 row)

test_spgist=# select count(1) from route_raw ;
count
--------
998687
(1 row)

test_spgist=# select count(1) from route_raw ;
count
--------
999647
(1 row)

test_spgist=# select count(1) from route_raw ;
count
--------
999647
(1 row)

......
......

test_spgist=# select count(1) from route_raw ;
count
--------
999647
(1 row)

When I found that the counts of the records of the table route_raw is is not increased, I immediately went to displays the process information using command gdb at that time.

4. display process information using gdb

[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ sudo gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html><http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/><http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 26097
Attaching to process 26097
Reading symbols from /opt/pg92/bin/postgres...done.
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000039e4ed5337 in semop () from /lib64/libc.so.6
(gdb) bt
#0 0x00000039e4ed5337 in semop () from /lib64/libc.so.6
#1 0x00000000006985f3 in PGSemaphoreLock (sema=0x2b4cb1e2bb30, interruptOK=0 '\000') at pg_sema.c:415
#2 0x00000000006ea043 in LWLockAcquire (lockid=355054, mode=LW_EXCLUSIVE) at lwlock.c:474
#3 0x00000000006d0739 in LockBuffer (buffer=177443, mode=2) at bufmgr.c:2533
#4 0x00000000004d9ab0 in spgdoinsert (index=0x2b4cb26edcc0, state=0x7fffb8ce3f30, heapPtr=0xb574124, datum=190267736, isnull=0 '\000') at spgdoinsert.c:1940
#5 0x00000000004d1880 in spginsert (fcinfo=0x7fffb8ce3fe0) at spginsert.c:220
#6 0x000000000080c921 in FunctionCall6Coll (flinfo=0xb5702a0, collation=0, arg1=47608411118784, arg2=140736293913696, arg3=140736293913664, arg4=190267684, arg5=47608411039440, arg6=0)
at fmgr.c:1439
#7 0x0000000000489526 in index_insert (indexRelation=0x2b4cb26edcc0, values=0x7fffb8ce4460, isnull=0x7fffb8ce4440 "", heap_t_ctid=0xb574124, heapRelation=0x2b4cb26da6d0,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:216
#8 0x00000000005eaf3c in ExecInsertIndexTuples (slot=0xb56e520, tupleid=0xb574124, estate=0xb56e290) at execUtils.c:1087
#9 0x00000000005fcfb5 in ExecInsert (slot=0xb56e520, planSlot=0xb56e520, estate=0xb56e290, canSetTag=1 '\001') at nodeModifyTable.c:248
#10 0x00000000005fde2e in ExecModifyTable (node=0xb56e5b0) at nodeModifyTable.c:848
#11 0x00000000005ddde3 in ExecProcNode (node=0xb56e5b0) at execProcnode.c:376
#12 0x00000000005dc24e in ExecutePlan (estate=0xb56e290, planstate=0xb56e5b0, operation=CMD_INSERT, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0xb556160)
at execMain.c:1393
#13 0x00000000005daa2a in standard_ExecutorRun (queryDesc=0xb486830, direction=ForwardScanDirection, count=0) at execMain.c:301
#14 0x00000000005da922 in ExecutorRun (queryDesc=0xb486830, direction=ForwardScanDirection, count=0) at execMain.c:249
#15 0x00000000006fb344 in ProcessQuery (plan=0xb50ce80,
sourceText=0xb50b500 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"..., params=0x0, dest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:185
#16 0x00000000006fcaa9 in PortalRunMulti (portal=0xb484820, isTopLevel=1 '\001', dest=0xb556160, altdest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:1277
#17 0x00000000006fc183 in PortalRun (portal=0xb484820, count=9223372036854775807, isTopLevel=1 '\001', dest=0xb556160, altdest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:814
#18 0x00000000006f66da in exec_simple_query (
query_string=0xb50b500 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"...) at postgres.c:1046
#19 0x00000000006fa5bd in PostgresMain (argc=2, argv=0xb4664e8, username=0xb4663f0 "postgres") at postgres.c:3958
#20 0x00000000006a9305 in BackendRun (port=0xb489290) at postmaster.c:3619
#21 0x00000000006a8a02 in BackendStartup (port=0xb489290) at postmaster.c:3304
#22 0x00000000006a591a in ServerLoop () at postmaster.c:1367
#23 0x00000000006a52a8 in PostmasterMain (argc=3, argv=0xb464660) at postmaster.c:1127
#24 0x000000000061cfcb in main (argc=3, argv=0xb464660) at main.c:199

I say my doubt once one "May be the spgist index has a bug in a heavy write condition?"

My colleagues and I are looking forward very much to hearing from you.

Thanks again

Best Regards!

at 2012-12-14 12:00, hailong.li wrote:

hi,

The problem is not always appear on our system, we can't find a way to
reproduce it.
After rebuild the index with btree, the problem is disappear

at 2012-12-14 00:16, Tom Lane wrote:

=?utf-8?B?5p2O5rW36b6Z?= <hailong(dot)li(at)qunar(dot)com><mailto:hailong(dot)li(at)qunar(dot)com> writes:

We have lots data to insert in that table which have the spgist index,
may be the spgist index have a bug on a heavy write condition?

Perhaps, but you certainly haven't provided any information that would
help anyone to fix the bug. Can you create a self-contained test case?

regards, tom lane

at 2012-12-13 23:46, hailong.li wrote:

Hi,pgsql-hackers,

I'm not sure whether it is a bug of using spgist index or not .

OS Version:
CentOS release 6.2 (Final)

PostgreSQL Version:
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit
(1 row)

The lock infomation, pid 17225 was granted
> # select * from pg_locks where transactionid = 45463704;

> locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | f
> astpath
>
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+--
> --------
> transactionid | | | | | |
45463704 | | | | 131/103 | 17225 |
ExclusiveLock | t | f
> transactionid | | | | | |
45463704 | | | | 185/44 | 24592 |
ShareLock | f | f
> (2 rows)

The query is come from 192.168.232.43,but we already killed the process
at that server, there is no tcp connection from that server, the
following command output nothing.
> # netstat -anp | grep 192.168.232.43

Postgres process stat which running the query, it's doing a insert.
> postgres 17225 0.0 0.0 1763812 19056 ? Ss 21:48 0:00
postgres: searcher vacation 192.168.232.43(64757) INSERT
>

Strace info for pid 17225
> # strace -p 17225
> Process 17225 attached - interrupt to quit
> semop(877035566, {{5, -1, 0}}, 1

backtrace for pid 17225, we found there is running some function named
like spgxxxx
> Missing separate debuginfos, use: debuginfo-install
pg92-9.2.2-2.el6.x86_64
> (gdb) bt
> #0 0x0000003d6baea747 in semop () from /lib64/libc.so.6
> #1 0x00000000005fef57 in PGSemaphoreLock ()
> #2 0x000000000063f1a4 in LWLockAcquire ()
> #3 0x00000000004b7002 in spgdoinsert ()
> #4 0x00000000004b25e1 in spginsert ()
> #5 0x00000000007155a4 in FunctionCall6Coll ()
> #6 0x000000000047b4e0 in index_insert ()
> #7 0x00000000005871d2 in ExecInsertIndexTuples ()
> #8 0x0000000000594747 in ExecModifyTable ()
> #9 0x000000000057de18 in ExecProcNode ()
> #10 0x000000000057ccb1 in standard_ExecutorRun ()
> #11 0x000000000064e93f in ProcessQuery ()
> #12 0x000000000064eb6f in PortalRunMulti ()
> #13 0x000000000064f32e in PortalRun ()
> #14 0x000000000064cba2 in PostgresMain ()
> #15 0x000000000060a547 in ServerLoop ()
> #16 0x000000000060d077 in PostmasterMain ()
> #17 0x00000000005ac3d0 in main ()

We have lots data to insert in that table which have the spgist index,
may be the spgist index have a bug on a heavy write condition?

Thanks

Best Regards!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2013-01-08 11:24:32 Re: Improve compression speeds in pg_lzcompress.c
Previous Message Magnus Hagander 2013-01-08 09:32:40 Re: PL/Python result object str handler