Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken

From: Michael Sheaver <msheaver(at)me(dot)com>
To: "Maeldron T(dot)" <maeldron(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Date: 2016-12-07 13:33:18
Message-ID: 51545CF9-F503-4728-B99D-4054BC06B43D@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I would like to echo the sentiment on collation and expand it to character sets in general. When issues with them come up, they do take an incredible amount of time and effort to resolve, and are one of my own biggest pain points when dealing with databases and datasets from other sources. Case in point:

I have one dataset that I pull into my local (Windows laptop) for analysis on a weekly basis. Invariably it is sprinkled with special characters that trip up the copy command; these appear to come from people fat-fingering in the data input and the system not checking it. It only takes one of these special characters to trip up and negate the entire result of the copy command. After wrestling with this for a couple days about a year ago, the workaround I found that works is to first import it into a MySQL table, strip out the characters in MySQL, dump the data out to a CSV and finally bring the sanitized data into Postgres using the copy command.

So yes, character set and collations sets are the bane of DBAs, and yes, they are a bitch to troubleshoot and yes, they often take very creative solutions to fix.

Michael

> On Dec 7, 2016, at 4:40 AM, Maeldron T. <maeldron(at)gmail(dot)com> wrote:
>
> Hello Thomas,
>
>
> > (Maybe database clusters should have a header that wouldn’t allow
> > incompatible server versions to process the existing data. I wonder if it
> > would take more than 8 bytes per server. But I guess it was not know to be
> > incompatible. Even my two CIs didn’t show it.)
>
> I had some thoughts along those lines too[3]. I thought about
> checksumming libc and all relevant collation files (ie OS specific
> files) so you could notice when anything that could bite you changes
> (that is just some sketched out ideas, not production code). Some
> have suggested that PostgreSQL should stop using OS collations and
> adopt ICU[4] and then use its versioning metadata. Of course some
> people already use ICU on FreeBSD because the old strcoll
> implementation didn't play well with Unicode, but those users are
> still presumably exposed to silent corruption when ICU changes because
> AFAIK that didn't keep track of ICU versions.
>
>
> I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related.
>
> I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8.
>
> I think I spent more time on fixing locale related issues than on fixing all the rest together.
>
> (Not counting the bugs in my own code.)
>
> Once I tried DragonflyBSD. It could not compile ICU so I deleted it.
>
> I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.
>
> M

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2016-12-07 15:43:13 Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Previous Message Juliano 2016-12-07 12:06:47 repmgr new standby node without clone master data