Re: I s this a bug of spgist index in a heavy write condition?

From: 李海龙 <hailong(dot)li(at)qunar(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, 何伟平 <weiping(dot)he(at)qunar(dot)com>, 王冬 <dong(dot)wang(at)qunar(dot)com>, 于超超 <chaochao(dot)yu(at)qunar(dot)com>
Subject: Re: I s this a bug of spgist index in a heavy write condition?
Date: 2013-01-11 04:08:30
Message-ID: 50EF9043.2060009@qunar.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,dear tom lane && pgsql-hackers

In your last e-mail, you want me to think anbout the only spgist-indexed column and give you some more further information.

This time I will give you the contents of the table route_raw, the download link is https://www.box.com/s/yxa4yxo6rcb3dzeaefmz or http://dl.dropbox.com/u/203288/route_raw_spgist.sql.tar.gz .

OS Version:
CentOS release 5.5 (Final)

PostgreSQL Version:
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
(1 row)

I create a database named test_spgist,and there is only one table named route_raw which only has a id (primary key ) column and a spgist index column in the database.

Some details is shown as following:

test_spgist=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | route_raw | table | postgres
(1 row)

test_spgist=# \d route_raw
Table "public.route_raw"
Column | Type | Modifiers
-----------+-------------------------+--------------------------------------------------------
sourceurl | character varying(1000) | not null default ''::character varying
id | integer | not null default nextval('route_raw_id_seq'::regclass)
Indexes:
"route_raw_pkey" PRIMARY KEY, btree (id)
"route_raw_sourceurl_idx" spgist (sourceurl)

--Before the test, this table has 997736 records.
test_spgist=# select count(1) from route_raw ;
count
--------
997736
(1 row)

1. edit a shell script that function is insert data to table route_raw

$ cat /tmp/insert_spgist.sh
#/bin/bash

PSQL="/opt/pg92/bin/psql"
DB=test_spgist
USR=postgres

while [ 10 -ne 9 ]
do
SQL="insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/route_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_134782.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')"
sleep 1
$PSQL -U $USR -d $DB -c "$SQL"
done

2. I create 15 clients and execute this shell script in each client

$ /tmp/insert_spgist.sh >/dev/null
...
...
...
]$ /tmp/insert_spgist.sh >/dev/null

3. only 1 connection to update the table

$ psql -U postgres
psql (9.2.2)
Type "help" for help.

postgres=# \c test_spgist
You are now connected to database "test_spgist" as user "postgres".

test_spgist=# update route_raw set sourceurl = 'http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=CGZL120701MDG05LA&xl=13935'where<http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=CGZL120701MDG05LA&xl=13935%27where> sourceurl like '%=CGZL120701MDG0%';

4. I execute the statement "select * from pg_stat_activity; and select count(1) from route_raw; " to monitor the connections

test_spgist=# select count(1) from route_raw ;
count
--------
997799
(1 row)

......
......

test_spgist=# select count(1) from route_raw;
count
---------
1004066
(1 row)

......
......

test_spgist=# select count(1) from route_raw;
count
---------
1004066
(1 row)

test_spgist=# select * from pg_stat_activity ;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
query_start | state_change | waiting | state |
&nb sp;
query
-------+-------------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+----------
---------------------+-------------------------------+---------+--------+----------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16384 | test_spgist | 18797 | 10 | postgres | psql | | | -1 | 2013-01-10 20:27:05.171787+08 | | 2013-01-1
0 20:29:05.393515+08 | 2013-01-10 20:29:06.126108+08 | f | idle | select * from route_raw where sourceurl like '%CGZL120701MDG05LA&xl%';
16384 | test_spgist | 22272 | 10 | postgres | psql | | | -1 | 2013-01-10 18:23:20.035532+08 | 2013-01-10 20:31:57.898985+08 | 2013-01-1
0 20:31:57.898985+08 | 2013-01-10 20:31:57.898986+08 | f | active | select * from pg_stat_activity ;
16384 | test_spgist | 24391 | 10 | postgres | psql | | | -1 | 2013-01-10 20:29:07.113574+08 | 2013-01-10 20:29:07.1161+08 | 2013-01-1
0 20:29:07.1161+08 | 2013-01-10 20:29:07.116107+08 | f | active | insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Tra
vel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=12623 , http://www.ctszj.com.cn/route_group_1_147707.html,http://jx.bacts.com/group/group_33_118381.aspx, http:/
/www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%C9%F1%C5%A9%CE%D420120723&xl=7096 , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957 , http://www.ctszj.com.cn/rout
e_group_1_134782.html , http://fj.bacts.com/group/group_33_113288.asp , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=TUCCZ120625A&xl=7957, http://www.ctszj.com.cn/route_group_1_13478
2.html,http://fj.bacts.com/group/group_33_113288.aspx ,http://www.sztravel.com.cn/travel/detail.aspx?xlid=19883&ctype=1 , http://www.ctszj.com.cn/route_group_1_134695.html ')

......
......

16384 | test_spgist | 24437 | 10 | postgres | psql | | | -1 | 2013-01-10 20:29:38.498181+08 | 2013-01-10 20:31:17.963092+08 | 2013-01-1
0 20:31:17.963092+08 | 2013-01-10 20:31:17.963095+08 | f | active | update route_raw set sourceurl = 'http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=CGZL120701MDG05LA&xl=13935'whe<http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=CGZL120701MDG05LA&xl=13935%27whe>
re sourceurl like '%=CGZL120701MDG0%';
(18 rows)

5. display process information using gdb

[root(at)XXX ~]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.2)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html><http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/><http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 24391
Attaching to process 24391
Reading symbols from /opt/pg92/bin/postgres...done.
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000039e4ed5337 in semop () from /lib64/libc.so.6
(gdb) bt
#0 0x00000039e4ed5337 in semop () from /lib64/libc.so.6
#1 0x00000000006985f3 in PGSemaphoreLock (sema=0x2ac6df58b850, interruptOK=0 '\000') at pg_sema.c:415
#2 0x00000000006ea043 in LWLockAcquire (lockid=375058, mode=LW_EXCLUSIVE) at lwlock.c:474
#3 0x00000000006d0739 in LockBuffer (buffer=187445, mode=2) at bufmgr.c:2533
#4 0x00000000004d9ab0 in spgdoinsert (index=0x2ac6dfe42730, state=0x7fff015dad90, heapPtr=0x2a491c4, datum=44339704, isnull=0 '\000') at spgdoinsert.c:1940
#5 0x00000000004d1880 in spginsert (fcinfo=0x7fff015dae40) at spginsert.c:220
#6 0x000000000080c921 in FunctionCall6Coll (flinfo=0x296b360, collation=0, arg1=47033648162608, arg2=140733216305856, arg3=140733216305824, arg4=44339652, arg5=47033648129744, arg6=0)
at fmgr.c:1439
#7 0x0000000000489526 in index_insert (indexRelation=0x2ac6dfe42730, values=0x7fff015db2c0, isnull=0x7fff015db2a0 "", heap_t_ctid=0x2a491c4, heapRelation=0x2ac6dfe3a6d0,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:216
#8 0x00000000005eaf3c in ExecInsertIndexTuples (slot=0x2a42420, tupleid=0x2a491c4, estate=0x2a42190) at execUtils.c:1087
#9 0x00000000005fcfb5 in ExecInsert (slot=0x2a42420, planSlot=0x2a42420, estate=0x2a42190, canSetTag=1 '\001') at nodeModifyTable.c:248
#10 0x00000000005fde2e in ExecModifyTable (node=0x2a424b0) at nodeModifyTable.c:848
#11 0x00000000005ddde3 in ExecProcNode (node=0x2a424b0) at execProcnode.c:376
#12 0x00000000005dc24e in ExecutePlan (estate=0x2a42190, planstate=0x2a424b0, operation=CMD_INSERT, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2a38f98)
at execMain.c:1393
#13 0x00000000005daa2a in standard_ExecutorRun (queryDesc=0x296a740, direction=ForwardScanDirection, count=0) at execMain.c:301
#14 0x00000000005da922 in ExecutorRun (queryDesc=0x296a740, direction=ForwardScanDirection, count=0) at execMain.c:249
#15 0x00000000006fb344 in ProcessQuery (plan=0x29f0d90,
sourceText=0x29ef410 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"..., params=0x0, dest=0x2a38f98, completionTag=0x7fff015db940 "") at pquery.c:185
#16 0x00000000006fcaa9 in PortalRunMulti (portal=0x2968730, isTopLevel=1 '\001', dest=0x2a38f98, altdest=0x2a38f98, completionTag=0x7fff015db940 "") at pquery.c:1277
#17 0x00000000006fc183 in PortalRun (portal=0x2968730, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2a38f98, altdest=0x2a38f98, completionTag=0x7fff015db940 "") at pquery.c:814
#18 0x00000000006f66da in exec_simple_query (
query_string=0x29ef410 "insert into route_raw(sourceurl) values ('http://suz.bacts.com/group/group_33_128972.aspx , http://www.gzl.com.cn/Travel/TeamDetail.aspx?tuanhao=GHX%B0%B2%D0%C4%C0%F6%C0%A520120613%B4%BF%CD%E6&xl=1262"...) at postgres.c:1046
#19 0x00000000006fa5bd in PostgresMain (argc=2, argv=0x2949640, username=0x29494d0 "postgres") at postgres.c:3958
#20 0x00000000006a9305 in BackendRun (port=0x296cd70) at postmaster.c:3619
#21 0x00000000006a8a02 in BackendStartup (port=0x296cd70) at postmaster.c:3304
#22 0x00000000006a591a in ServerLoop () at postmaster.c:1367
#23 0x00000000006a52a8 in PostmasterMain (argc=3, argv=0x2947820) at postmaster.c:1127
#24 0x000000000061cfcb in main (argc=3, argv=0x2947820) at main.c:199

(gdb) frame 4
#4 0x00000000004d9ab0 in spgdoinsert (index=0x2ac6dfe42730, state=0x7fff015dad90, heapPtr=0x2a491c4, datum=44339704, isnull=0 '\000') at spgdoinsert.c:1940
1940 spgdoinsert.c: No such file or directory.
in spgdoinsert.c

(gdb) info locals
isNew = 0 '\000'
level = 0
leafDatum = 44339704
leafSize = 844
current = {blkno = 1, buffer = 187445, page = 0x0, offnum = 1, node = -1}
parent = {blkno = 4294967295, buffer = 0, page = 0x0, offnum = 0, node = -1}
__func__ = "spgdoinsert"
(gdb) p *(SpGistCache *) index->rd_amcache
$1 = {config = {prefixType = 25, labelType = 18, canReturnData = 1 '\001', longValuesOK = 1 '\001'}, attType = {type = 1043, attbyval = 0 '\000', attlen = -1}, attPrefixType = {type = 25,
attbyval = 0 '\000', attlen = -1}, attLabelType = {type = 18, attbyval = 1 '\001', attlen = 1}, lastUsedPages = {cachedPage = {{blkno = 6174, freeSpace = 4308}, {blkno = 6172,
freeSpace = 2024}, {blkno = 6173, freeSpace = 5244}, {blkno = 6217, freeSpace = 6228}, {blkno = 4294967295, freeSpace = 0}, {blkno = 4294967295, freeSpace = 0}, {blkno = 4294967295,
freeSpace = 0}, {blkno = 4294967295, freeSpace = 0}}}}

Thanks again

Best Regards!

于 2013年01月09日 12:28, Tom Lane 写道:

I wrote:

The control flow in spgdoinsert.c is flat enough that the stack trace
alone isn't much help in understanding the bug, I'm afraid.

BTW, something that possibly *would* help, since you seem to be able to
reproduce the bug easily, is to do that and then capture the values of
the local variables in spgdoinsert() -- especially the contents of the
"current" and "parent" structs --- from each of the processes that are
stuck. Also interesting would be to print the SpGistCache structs.
It'd go something like
frame 4
info locals
p *(SpGistCache *) index->rd_amcache

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message pgbuildfarm 2013-01-11 05:30:40 Re: PL/perl should fail on configure, not make
Previous Message Aaron W. Swenson 2013-01-11 03:59:28 Re: PL/perl should fail on configure, not make