Collations and Replication; Next Steps

From: Matthew Kelly <mkelly(at)tripadvisor(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Matthew Spilich <mspilich(at)tripadvisor(dot)com>
Subject: Collations and Replication; Next Steps
Date: 2014-09-16 16:06:31
Message-ID: F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Last month, I brought up the following issue to the general mailing list about how running streaming replication between machines running different versions of glibc can cause corrupt indexes.
http://www.postgresql.org/message-id/BA6132ED-1F6B-4A0B-AC22-81278F5AB81E@tripadvisor.com

In the month following, we have done further investigation here at TripAdvisor and have found that scope of this issue is far more troubling than initially thought. Hackers seems like appropriate place to present this update because it will certainly motivate some discussion about the approach to collation support going forward.

After the initial episode, we thought it was necessary to find the true scope of the problem. We developed a quick smoke test to evaluate the integrity of the indexes on a given machine. We understood that the test was not exhaustive, but it would catch most instances of corrupt indexes given TripAdvisor's normal database usage pattern. The source code with documentation about how it works is available at (https://github.com/mkellycs/postgres_index_integrity_check) for those interested.

What we found with this simple check was simply frightening. In every single streaming replica cluster where one or more machines had been commissioned at a different time, that member was found to be corrupt. When hardware upgrades of the master had been accomplished with a streaming replication, the new master was also found to have similar issues. The following numbers are only as small as they are because our adoption of streaming replication has barely just begun. So far we have found:

* 8 internal production databases, and 2 live site database servers effected.
* Up to 3771 rows out of place in a single index (more correctly: 3771 times a row was smaller then the row before it when sorted in ascending order, the actual number of incorrectly placed rows is probably much higher)
* On the worst offender, there were 12806 rows out of place across 26 indexes
* On average roughly 15% of indexes containing text keys on tables larger 100MB were found to exhibit this issue
* In at least one case, rebuilding a unique index on a master revealed that the database had allowed 100+ primary key violations.

It sounds like we as a community knew that these issues were theoretically possible, but I now have empirical evidence demonstrating the prevalence of this issue on our corpus of international data. Instances of this issue showed up in indexes of member usernames, location/property names, and even Facebook url's. I encourage other sufficiently large operations who index internationalized text to run similar tests; its highly likely they have similar latent issues that they just have not detected yet.

Here is the simple reality. Collation based indexes, streaming replication, and multiple versions of glibc/os cannot coexist in a sufficiently large operation and not cause corrupt indexes. The current options are to collate all of your indexes in C, or to ensure that all of your machines run exactly the same OS version.

The first and immediate TODO is to patch the documentation to add warnings regarding this issue. I can propose a doc patch explaining the issue, if no one has any objections.

The second and far more challenging problem is how do we fix this issue? As of our last discussion, Peter Geoghegan revived the proposal of using ICU as an alternative. (http://www.postgresql.org/message-id/CAEYLb_WvdCzuL=Cyf1xyzjwn-1CVo6kZEaWMKbxTS3jPhtjOig@mail.gmail.com) I do not feel qualified to compare the value of this library to other options, but I am certainly willing to help with the patch process once a direction has been selected.

I will be at Postgres Open in Chicago this week, and I will be more than willing to further discuss the details of what we have found.

Regards,
Matt Kelly

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Hernández Tortosa 2014-09-16 16:12:58 Re: PL/pgSQL 2
Previous Message Álvaro Hernández Tortosa 2014-09-16 15:59:17 Re: PL/pgSQL 2