Re: index corruption on composite primary key indexes

From: "Ng, Stan" <sng(at)automotive(dot)com>
To: "Mikael Krantz" <mk(at)zigamorph(dot)se>
Cc: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: index corruption on composite primary key indexes
Date: 2010-12-16 00:07:57
Message-ID: 6B0DF81DDD9F654C932FB34C8D7ACE627E1888@mail-001.corp.automotive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Ah, thanks for pointing that out! It is an inconsistency that I should clean up. A lot of the SQL is generated programmatically, so it gets a bit hairy sometimes. In this particular case, I have confirmed that all the postalcodes are actually US zips, so the lower() is pretty unnecessary.

The more I dig into this, the more I suspect it's an issue w/ MVCC. This bug is somewhat similar, although in our case we have a pre-existing index: http://archives.postgresql.org/pgsql-bugs/2010-12/msg00033.php

I also found an instance of this duplicate key error in the logs where it recovered w/o manual intervention after 2 retries. My guess is that auto-vacuum cleared out some old duplicate row. And just to clarify for anyone who might have missed my other update, I did find the duplicate key error logged in the PostgreSQL logs.

Now I just need to get my head wrapped around the corner cases of MVCC + indexes. From a layman's perspective, it seems the index should only operate on the latest version. Perhaps there's a pgsql configuration option or hint on the index that I'm missing... The hunt continues!

Cheers,
-- stan

-----Original Message-----
From: Mikael Krantz [mailto:mk(at)zigamorph(dot)se]
Sent: Tuesday, December 14, 2010 4:36 PM
To: Ng, Stan
Cc: Craig Ringer; pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] index corruption on composite primary key indexes

I'm afraid that I'm a bit out of my depth when it comes to index corruption.

I did note a possibly unrelated problem: The unique index includes the
lower(postalcode::text) but your left join is made on
VehicleUsedICCGRadiusCache.PostalCode = t1.PostalCode. It seems likely
that you do this left join to exclude rows already present in
VehicleUsedICCGRadiusCache. If PostalCode can contain mixed case this
will be one possible source of duplicate key violations. Differently
cased PostalCodes will not be excluded by your left join but will be
considered identical by the unique index.

This might not at all be connected to your current problem as it seems
like something that would work or not work in a very consistent manner
regardless on reindexing but it could be useful anyway. Changing the
join condition to use lower() would also make the join able to use
more of the index for the join and could improve performance.

Best Regards

Mikael Krantz

On Tue, Dec 14, 2010 at 7:42 PM, Ng, Stan <sng(at)automotive(dot)com> wrote:
> Ah, I forgot to mention why I suspected corruption. The delta
> application is a single threaded, sequential process. Each delta is done
> within a single transaction. If it fails, there is retry logic that will
> continue to attempt the delta application forever, with about a 5 to 10
> minute wait. This will fail every time until a manual REINDEX command is
> run on the composite primary key index in-between retry attempts. I've
> also backed up the data to another table w/ the same indices and applied
> the delta data manually without any problems while this error was
> happening.
>
> Oddly enough, even with full logging on, the pgsql logs don't contain
> anything about the duplicate key error or suspect index. Perhaps another
> important piece of information is that we're doing all this in Java
> using the JDBC drivers, so we're seeing this as part of our stacktrace.
> i.e.
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates
> unique constraint "vehicleusediccgradiuscache_i00"
>        at
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecu
> torImpl.java:2102)
>        at
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImp
> l.java:1835)
>        at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:
> 257)
>        at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Stateme
> nt.java:500)
>        at
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdb
> c2Statement.java:388)
>        at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Stateme
> nt.java:381)
>          ...
>
> The index looks like this:
> CREATE UNIQUE INDEX vehicleusediccgradiuscache_i00
>  ON vehicleusediccgradiuscache
>  USING btree
>  (datasetid, lower(postalcode::text), productgroupid,
> vehicleclassgroupidic, vehicleusedidic, vehicleuseddisplaypriority,
> vehicleusedsearchradius);
>
>
> A single delta is calculated on the master server, which is pulled by
> each client server. Each client puts the delta data into temp tables
> (one each for delete, update, and insert, applied in that order), and
> modifies the data via SQL somewhat like the following:
>
> insert into
> VehicleUsedICCGRadiusCache(DatasetId,PostalCode,ProductGroupId,VehicleCl
> assGroupIdIC,VehicleUsedIdIC,VehicleUsedDisplayPriority,VehicleUsedSearc
> hRadius,"VehicleUsedPrice.average","VehicleUsedPrice.min","VehicleUsedPr
> ice.max","VehicleUsedIntId.distinct_count") select
> t1.DatasetId,t1.PostalCode,t1.ProductGroupId,t1.VehicleClassGroupIdIC,t1
> .VehicleUsedIdIC,t1.VehicleUsedDisplayPriority,t1.VehicleUsedSearchRadiu
> s,t1."VehicleUsedPrice.average",t1."VehicleUsedPrice.min",t1."VehicleUse
> dPrice.max",t1."VehicleUsedIntId.distinct_count" from t1
> left outer join VehicleUsedICCGRadiusCache on
> VehicleUsedICCGRadiusCache.VehicleUsedSearchRadius=t1.VehicleUsedSearchR
> adius and VehicleUsedICCGRadiusCache.ProductGroupId=t1.ProductGroupId
> and
> VehicleUsedICCGRadiusCache.VehicleUsedDisplayPriority=t1.VehicleUsedDisp
> layPriority and
> VehicleUsedICCGRadiusCache.VehicleUsedIdIC=t1.VehicleUsedIdIC and
> VehicleUsedICCGRadiusCache.DatasetId=t1.DatasetId and
> VehicleUsedICCGRadiusCache.PostalCode=t1.PostalCode and
> VehicleUsedICCGRadiusCache.VehicleClassGroupIdIC=t1.VehicleClassGroupIdI
> C
> where VehicleUsedICCGRadiusCache.VehicleUsedSearchRadius is null
>
>
>
> -----Original Message-----
> From: Craig Ringer [mailto:craig(at)postnewspapers(dot)com(dot)au]
> Sent: Tuesday, December 14, 2010 4:47 AM
> To: Ng, Stan
> Cc: pgsql-bugs(at)postgresql(dot)org
> Subject: Re: [BUGS] index corruption on composite primary key indexes
>
> On 12/14/2010 10:18 AM, Ng, Stan wrote:
>
>> Some info on the platform I'm using:
>
> Please also show your schema and query / queries. If you think it's
> corruption, provide your postgresql logs too, including any mention of
> the names of the suspect indexes.
>
> Personally I agree with Mikael Krantz; you haven't shown any evidence of
>
> index corruption, and you're most likely hitting concurrency problems in
>
> your code.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Josh Berkus 2010-12-16 01:25:50 Re: What happened to SSL_CIPHERS?
Previous Message Magnus Hagander 2010-12-15 12:58:08 Re: What happened to SSL_CIPHERS?