Skip site navigation (1) Skip section navigation (2)

Re: I s this a bug of spgist index in a heavy write condition?

From: 李海龙 <hailong(dot)li(at)qunar(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org"<pgsql-hackers(at)postgresql(dot)org>
Cc: 何伟平 <weiping(dot)he(at)qunar(dot)com>, 王冬 <chaochao(dot)yu(at)qunar(dot)com>
Subject: Re: I s this a bug of spgist index in a heavy write condition?
Date: 2013-01-08 10:48:50
Message-ID: 50EBF992.2000704@qunar.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi,dear tom lane && pgsql-hackers

I am very excited to say that I may have created a test case!


OS Version:
CentOS release 5.5 (Final)

PostgreSQL Version:
postgres=# select version();
                                                    version
---------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
(1 row)

I create a database named test_spgist,and there is only one table named route_raw which has a spgist index column in the database.

Some details is shown as following:


test_spgist=# \dt
           List of relations
 Schema |   Name    | Type  |  Owner
--------+-----------+-------+----------
 public | route_raw | table | postgres
(1 row)

test_spgist=# \d route_raw
                                     Table "public.route_raw"
        Column        |            Type             |                  Modifiers
----------------------+-----------------------------+---------------------------------------------
 sourceurl            | character varying(1000)     | not null default ''::character varying
 title                | character varying(200)      | default NULL::character varying
 price                | numeric(10,0)               | default NULL::numeric
 routesnapshot        | character varying(4024)     | default NULL::character varying
 routesnapshot_small  | character varying(800)      | default NULL::character varying
 routesnapshot_mid    | character varying(800)      | default NULL::character varying
 feature              | text                        |
 departure            | character varying(50)       | default NULL::character varying
 arrive               | character varying(500)      | default NULL::character varying
 function             | character varying(50)       | default NULL::character varying
 type                 | character varying(50)       | default NULL::character varying
 subject              | text                        |
 dateofdeparture      | timestamp without time zone |
 dateofexpire         | timestamp without time zone |
 dateofbookingexpire  | timestamp without time zone |
 itineraryday         | integer                     |
 sightspot            | text                        |
 traffic              | character varying(100)      | default NULL::character varying
 stargrade            | double precision            |
 region               | character varying(50)       | default NULL::character varying
 subregion            | character varying(50)       | default NULL::character varying
 currency             | character varying(45)       | default NULL::character varying
 listinfo             | character varying(1024)     | default NULL::character varying
 dateofload           | timestamp without time zone |
 urlkey               | character varying(64)       | not null default ''::character varying
 sightspot_b          | text                        |
 sightspot_c          | text                        |
 wrapperid            | character varying(128)      | default NULL::character varying
 wrappername          | character varying(50)       | default NULL::character varying
 toflight             | character varying(45)       | default NULL::character varying
 backflight           | character varying(45)       | default NULL::character varying
 raw_date             | character varying(2048)     | default NULL::character varying
 feeinfo              | text                        |
 others               | text                        |
 feeinclude           | character varying(1000)     | default NULL::character varying
 feeexclude           | character varying(1000)     | default NULL::character varying
 ownexpense           | character varying(1000)     | default NULL::character varying
 bookingterms         | character varying(1000)     | default NULL::character varying
 visainfos            | character varying(1000)     | default NULL::character varying
 shopinfos            | character varying(500)      | default NULL::character varying
 flightinfos          | character varying(500)      | default NULL::character varying
 spotofdeparture      | character varying(20)       | default NULL::character varying
 spotofback           | character varying(20)       | default NULL::character varying
 tips                 | character varying(1000)     | default NULL::character varying
 contact              | character varying(100)      | default NULL::character varying
 raw_price            | character varying(100)      | default NULL::character varying
 raw_bookexpiredate   | character varying(100)      | default NULL::character varying
 transfer             | character varying(20)       | default NULL::character varying
 contractstyle        | character varying(500)      | default NULL::character varying
 citiesofbook         | character varying(1000)     | default NULL::character varying
 ishotelselected      | character varying(20)       | not null default 'false'::character varying
 freetripitineraryday | integer                     | not null default (-1)
 suitcrowdsubject     | character varying(50)       | default NULL::character varying
 newsubject           | character varying(50)       | default NULL::character varying
 isflightselected     | character varying(20)       | not null default 'false'::character varying
 festival             | character varying(512)      | default NULL::character varying
 promotion            | character varying(1024)     |
Indexes:
    "route_raw_1_idx" btree (price, departure, dateofdeparture, itineraryday)
    "route_raw_sourceurl_idx1" spgist (sourceurl)

test_spgist=# select sourceurl from route_raw limit 5;
                                                   sourceurl
---------------------------------------------------------------------------------------------------------------
 http://suz.bacts.com/group/group_33_128972.aspx
 http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623
 http://www.ctszj.com.cn/route_group_1_147707.html
 http://jx.bacts.com/group/group_33_118381.aspx
 http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096
(5 rows)

--Before the test, this table has 997736 records.
test_spgist=# select count(1) from route_raw ;
 count
--------
 997736
(1 row)

My some deteails the test procedure is as follows.

1. edit a shell script that function is insert data to table route_raw

[hailong(dot)li(at)pgdb2 ~]$ cat /tmp/insert_spgist.sh
#/bin/bash

export PGPASSWORD="6b4df779-dd13-4031-a00b-126b49287995"
PSQL="/opt/pg92/bin/psql"
DBHOST=l-pgdb2.s.cn6
DB=test_spgist
USR=postgres

while [ 10 -ne 9 ]
do
        SQL="insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/route_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp ,  http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_134782.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')"
        sleep 1
        $PSQL -h $DBHOST -U $USR -d $DB -c "$SQL"
done

2. I create 15 clients and execute this shell script in each client

[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ /tmp/insert_spgist.sh >/dev/null
...
...
...
[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ /tmp/insert_spgist.sh >/dev/null

3. I execute the statement "select * from pg_stat_activity; and select count(1) from route_raw; " to monitor the connections

test_spgist=# select * from pg_stat_activity ;
 datid |   datname   |  pid  | usesysid | usename  | application_name |   client_addr   | client_hostname | client_port |         backend_start         |          xact_start           |
    query_start          |         state_change          | waiting | state  |
                                                                                                                                                                       ;
                                                                                                                             query
                                                                                                                                                                       ;
                                                                                                                                                                       ;
-------+-------------+-------+----------+----------+------------------+-----------------+-----------------+-------------+-------------------------------+-------------------------------+------
-------------------------+-------------------------------+---------+--------+------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 16384 | test_spgist | 14959 |       10 | postgres | psql             |                 |                 |          -1 | 2013-01-08 16:43:53.175203+08 | 2013-01-08 17:16:01.997493+08 | 2013-
01-08 17:16:01.997493+08 | 2013-01-08 17:16:01.997495+08 | f       | active | select * from pg_stat_activity ;

 16384 | test_spgist | 16858 |       10 | postgres | psql             |                 |                 |          -1 | 2013-01-08 17:04:15.000472+08 |                               | 2013-
01-08 17:15:01.303197+08 | 2013-01-08 17:15:02.372496+08 | f       | idle   | select count(1) from route_raw ;

16384 | test_spgist | 26097 |       10 | postgres | psql             | 192.168.232.122 |                 |       54637 | 2013-01-08 17:12:16.228789+08 | 2013-01-08 17:12:16.231496+08 | 2013-
01-08 17:12:16.231496+08 | 2013-01-08 17:12:16.231504+08 | f       | active | insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn
/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, ht
tp://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/
route_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp ,  http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_1
34782.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')
......
......
(17 rows)

test_spgist=# select count(1) from route_raw ;
 count
--------
 997799
(1 row)

test_spgist=# select count(1) from route_raw ;
 count
--------
 997801
(1 row)

test_spgist=# select count(1) from route_raw ;
 count
--------
 998687
(1 row)

test_spgist=# select count(1) from route_raw ;
 count
--------
 999647
(1 row)

test_spgist=# select count(1) from route_raw ;
 count
--------
 999647
(1 row)

......
......

test_spgist=# select count(1) from route_raw ;
 count
--------
 999647
(1 row)

When I found that the counts of the records of the table route_raw is is not increased, I immediately went to  displays the  process information  using command gdb  at that time.

4. display process information using gdb

[hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6<mailto:hailong(dot)li(at)l-pgdb2(dot)s(dot)cn6> ~]$ sudo gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html><http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/><http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 26097
Attaching to process 26097
Reading symbols from /opt/pg92/bin/postgres...done.
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000039e4ed5337 in semop () from /lib64/libc.so.6
(gdb) bt
#0  0x00000039e4ed5337 in semop () from /lib64/libc.so.6
#1  0x00000000006985f3 in PGSemaphoreLock (sema=0x2b4cb1e2bb30, interruptOK=0 '\000') at pg_sema.c:415
#2  0x00000000006ea043 in LWLockAcquire (lockid=355054, mode=LW_EXCLUSIVE) at lwlock.c:474
#3  0x00000000006d0739 in LockBuffer (buffer=177443, mode=2) at bufmgr.c:2533
#4  0x00000000004d9ab0 in spgdoinsert (index=0x2b4cb26edcc0, state=0x7fffb8ce3f30, heapPtr=0xb574124, datum=190267736, isnull=0 '\000') at spgdoinsert.c:1940
#5  0x00000000004d1880 in spginsert (fcinfo=0x7fffb8ce3fe0) at spginsert.c:220
#6  0x000000000080c921 in FunctionCall6Coll (flinfo=0xb5702a0, collation=0, arg1=47608411118784, arg2=140736293913696, arg3=140736293913664, arg4=190267684, arg5=47608411039440, arg6=0)
    at fmgr.c:1439
#7  0x0000000000489526 in index_insert (indexRelation=0x2b4cb26edcc0, values=0x7fffb8ce4460, isnull=0x7fffb8ce4440 "", heap_t_ctid=0xb574124, heapRelation=0x2b4cb26da6d0,
    checkUnique=UNIQUE_CHECK_NO) at indexam.c:216
#8  0x00000000005eaf3c in ExecInsertIndexTuples (slot=0xb56e520, tupleid=0xb574124, estate=0xb56e290) at execUtils.c:1087
#9  0x00000000005fcfb5 in ExecInsert (slot=0xb56e520, planSlot=0xb56e520, estate=0xb56e290, canSetTag=1 '\001') at nodeModifyTable.c:248
#10 0x00000000005fde2e in ExecModifyTable (node=0xb56e5b0) at nodeModifyTable.c:848
#11 0x00000000005ddde3 in ExecProcNode (node=0xb56e5b0) at execProcnode.c:376
#12 0x00000000005dc24e in ExecutePlan (estate=0xb56e290, planstate=0xb56e5b0, operation=CMD_INSERT, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0xb556160)
    at execMain.c:1393
#13 0x00000000005daa2a in standard_ExecutorRun (queryDesc=0xb486830, direction=ForwardScanDirection, count=0) at execMain.c:301
#14 0x00000000005da922 in ExecutorRun (queryDesc=0xb486830, direction=ForwardScanDirection, count=0) at execMain.c:249
#15 0x00000000006fb344 in ProcessQuery (plan=0xb50ce80,
    sourceText=0xb50b500 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"..., params=0x0, dest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:185
#16 0x00000000006fcaa9 in PortalRunMulti (portal=0xb484820, isTopLevel=1 '\001', dest=0xb556160, altdest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:1277
#17 0x00000000006fc183 in PortalRun (portal=0xb484820, count=9223372036854775807, isTopLevel=1 '\001', dest=0xb556160, altdest=0xb556160, completionTag=0x7fffb8ce4ae0 "") at pquery.c:814
#18 0x00000000006f66da in exec_simple_query (
    query_string=0xb50b500 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"...) at postgres.c:1046
#19 0x00000000006fa5bd in PostgresMain (argc=2, argv=0xb4664e8, username=0xb4663f0 "postgres") at postgres.c:3958
#20 0x00000000006a9305 in BackendRun (port=0xb489290) at postmaster.c:3619
#21 0x00000000006a8a02 in BackendStartup (port=0xb489290) at postmaster.c:3304
#22 0x00000000006a591a in ServerLoop () at postmaster.c:1367
#23 0x00000000006a52a8 in PostmasterMain (argc=3, argv=0xb464660) at postmaster.c:1127
#24 0x000000000061cfcb in main (argc=3, argv=0xb464660) at main.c:199



I say my doubt once one "May be the spgist index has a bug in a heavy write condition?"

My colleagues and I are looking forward very much to hearing from you.


Thanks again

Best Regards!


at 2012-12-14 12:00, hailong.li wrote:

hi,

The problem is not always appear on our system, we can't find a way to
reproduce it.
After rebuild the index with btree, the problem is disappear




at 2012-12-14 00:16, Tom Lane wrote:


=?utf-8?B?5p2O5rW36b6Z?= <hailong(dot)li(at)qunar(dot)com><mailto:hailong(dot)li(at)qunar(dot)com> writes:


We have lots data to insert in that table which have the spgist index,
may be the spgist index have a bug on a heavy write condition?


Perhaps, but you certainly haven't provided any information that would
help anyone to fix the bug.  Can you create a self-contained test case?

                        regards, tom lane




at 2012-12-13 23:46, hailong.li wrote:


Hi,pgsql-hackers,

I'm not sure whether it is a bug of using spgist index or not .

OS Version:
CentOS release 6.2 (Final)

PostgreSQL Version:
postgres=# select version();
version
--------------------------------------------------------------------------------------------------------------
   PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.6 20110731 (Red Hat 4.4.6-3), 64-bit
(1 row)

The lock infomation, pid 17225 was granted
  > # select * from pg_locks where transactionid = 45463704;

  >    locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid  |     mode      | granted | f
  > astpath
  >
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------+---------+--
  > --------
  >  transactionid |          |          |      | |            |
45463704 |         |       |          | 131/103            | 17225 |
ExclusiveLock | t       | f
  >  transactionid |          |          |      | |            |
45463704 |         |       |          | 185/44             | 24592 |
ShareLock     | f       | f
  > (2 rows)


The query is come from 192.168.232.43,but we already killed the process
at that server, there is no tcp connection from that server, the
following command output nothing.
  > # netstat -anp | grep 192.168.232.43

Postgres process stat which running the query, it's doing a insert.
  > postgres 17225  0.0  0.0 1763812 19056 ?       Ss   21:48 0:00
postgres: searcher vacation 192.168.232.43(64757) INSERT
  >


Strace info for pid 17225
  > # strace -p 17225
  > Process 17225 attached - interrupt to quit
  > semop(877035566, {{5, -1, 0}}, 1


backtrace for pid 17225, we found there is running some function named
like spgxxxx
  > Missing separate debuginfos, use: debuginfo-install
pg92-9.2.2-2.el6.x86_64
  > (gdb) bt
  > #0  0x0000003d6baea747 in semop () from /lib64/libc.so.6
  > #1  0x00000000005fef57 in PGSemaphoreLock ()
  > #2  0x000000000063f1a4 in LWLockAcquire ()
  > #3  0x00000000004b7002 in spgdoinsert ()
  > #4  0x00000000004b25e1 in spginsert ()
  > #5  0x00000000007155a4 in FunctionCall6Coll ()
  > #6  0x000000000047b4e0 in index_insert ()
  > #7  0x00000000005871d2 in ExecInsertIndexTuples ()
  > #8  0x0000000000594747 in ExecModifyTable ()
  > #9  0x000000000057de18 in ExecProcNode ()
  > #10 0x000000000057ccb1 in standard_ExecutorRun ()
  > #11 0x000000000064e93f in ProcessQuery ()
  > #12 0x000000000064eb6f in PortalRunMulti ()
  > #13 0x000000000064f32e in PortalRun ()
  > #14 0x000000000064cba2 in PostgresMain ()
  > #15 0x000000000060a547 in ServerLoop ()
  > #16 0x000000000060d077 in PostmasterMain ()
  > #17 0x00000000005ac3d0 in main ()

We have lots data to insert in that table which have the spgist index,
may be the spgist index have a bug on a heavy write condition?


Thanks

Best Regards!


In response to

Responses

pgsql-hackers by date

Next:From: Hannu KrosingDate: 2013-01-08 11:24:32
Subject: Re: Improve compression speeds in pg_lzcompress.c
Previous:From: Magnus HaganderDate: 2013-01-08 09:32:40
Subject: Re: PL/Python result object str handler

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group