Re: Multicolumn index corruption on 8.4 beta 2

From: Floris Bos / Maxnet <bos(at)je-eigen-domein(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Multicolumn index corruption on 8.4 beta 2
Date: 2009-06-09 11:40:28
Message-ID: 4A2E4A2C.3090207@je-eigen-domein.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Richard Huxton wrote:
> Not a hacker myself, but I can tell you that the first question you'll
> be asked is "can you produce a test case"? If you can generate the
> problem from a test table+generated data that will let people figure out
> the problem for you.

Unfortunately, I have not been able to produce a test case (yet) on a
small data set.
While the data in the database is public information, the whole database
is about 100 GB, and therefore kinda hard to share.

> If not, details of the table schema will be needed, and is there any
> pattern to the missed rows? Also - compile settings, character set and
> locale details might be relevant too.

==
Compile settings
==

No fancy settings.
- Clean Opensolaris 2009.06 installation
- Installed gcc and gmake packages.
- Downloaded source and did a ./configure --disable-readline ; gmake ;
gmake install

==
Postgresql settings
==

The following settings differ from the defaults:

--
shared_buffers=3500MB
maintenance_work_mem = 128MB
fsync = off
synchronous_commit = off
checkpoint_segments = 25
--

The locale used when creating the database is SQL_ASCII

==
Hardware
==

Tyan barebone
2x Opteron 2376 quadcore
32 GB reg ecc memory
1x Intel X25-E 32 GB SSD for OS and pg_xlog directory
2x Intel X25-E 64 GB SSD (ZFS striping) for the database

==
Table layout
==

--
Table "public.posts_index"
Column | Type |
Modifiers
------------+------------------------+-----------------------------------------------------------
cid | integer | not null default
nextval('posts_index_cid
_seq'::regclass)
groupid | integer | not null
startdate | integer | not null
poster | character varying(64) | not null
basefile | character varying(64) | not null
subject | character varying(255) | not null
size | real |
nfo | boolean |
c | boolean |
parts | integer |
totalparts | integer |
imdb | integer |
ng1 | boolean | default false
g2 | integer | default 0
g3 | integer | default 0
data | bytea |
Indexes:
"posts_index5_pkey" PRIMARY KEY, btree (cid) CLUSTER
"gr_idx" btree (groupid, (- cid))
"pgb_idx" btree (poster, groupid, basefile)
--

Only noticed problems with the pgb_idx index so far.

The problem only occurs on a subset of the rows, at a time.
After adding/updating rows and doing a reindex, the rows that were
missing before sometimes suddenly do work, but then different ones do not.

> And can you post an explain plan for the incorrect scan? In particular is it using a bitmap index scan or a regular index scan? Or does it happen with either?

Happens with both.

Index scan:

===
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc(at)power-post(dot)org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.25..11.26 rows=1 width=0)
-> Index Scan using pgb_idx on posts_index (cost=0.00..11.25
rows=1 width=0)
Index Cond: (((poster)::text = 'Yenc(at)power-post(dot)org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))

=> SELECT count(*) FROM posts_index WHERE poster='Yenc(at)power-post(dot)org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
0
===

When I disable index scan, it uses bitmap without luck:

==
=> set enable_indexscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc(at)power-post(dot)org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.26..11.27 rows=1 width=0)
-> Bitmap Heap Scan on posts_index (cost=7.24..11.26 rows=1 width=0)
Recheck Cond: (((poster)::text = 'Yenc(at)power-post(dot)org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))
-> Bitmap Index Scan on pgb_idx (cost=0.00..7.24 rows=1 width=0)
Index Cond: (((poster)::text = 'Yenc(at)power-post(dot)org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))

=> SELECT count(*) FROM posts_index WHERE poster='Yenc(at)power-post(dot)org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
0
==

Sequential scan does find the row:

==
=> set enable_indexscan=false;
SET
=> set enable_bitmapscan=false;
SET
=> explain SELECT count(*) FROM posts_index WHERE
poster='Yenc(at)power-post(dot)org (Yenc-PP-A&A)' AND groupid=300 AND
basefile='NIB8124849';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=288153.28..288153.29 rows=1 width=0)
-> Seq Scan on posts_index (cost=0.00..288153.28 rows=1 width=0)
Filter: (((poster)::text = 'Yenc(at)power-post(dot)org
(Yenc-PP-A&A)'::text) AND (groupid = 300) AND ((basefile)::text =
'NIB8124849'::text))
(3 rows)

=> SELECT count(*) FROM posts_index WHERE poster='Yenc(at)power-post(dot)org
(Yenc-PP-A&A)' AND groupid=300 AND basefile='NIB8124849';
count
-------
1
==

Yours sincerely,

Floris Bos

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kedar Potdar 2009-06-09 11:51:30 Re: Patch for automating partitions in PostgreSQL 8.4 Beta 2
Previous Message gj 2009-06-09 10:44:45 Re: Patch for automating partitions in PostgreSQL 8.4 Beta 2