Too slow "Analyze" for the table with data in Thai language

From: "Timur Luchkin" <timur(dot)luchkin(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Too slow "Analyze" for the table with data in Thai language
Date: 2016-07-05 11:47:41
Message-ID: emb8e7a3a5-a559-41e0-8288-2163d6553f08@luchkin-new
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

I have a table (dictionary) with international data in many languages.
After the migration to the new hardware the 'analyze' operation instead
of 5 seconds takes 3-5 minutes now (with 100% cpu core usage).
Investigation showed that slow performance introduced by new glibc.
Previous versions of glibc (2.15 - 2.16) works fine, new versions (2.22
and 2.23) have this problem (versions between 2.17 and 2.21 were not
tested).
Further investigation showed that this problem exists with Thai data
only (we have 10 languages in the dictionary) and non-C collation (we
have en_US.utf8) and affects 'Analyze' and 'Order By' (of this
multilingual textual column) operations.
I can change collation of the column to C and it will fix the problem
for 'order by' operation of Thai data, but Im not interested in 'order
by' operation for multilingual table and interested mainly in ANALYZE
(which is not affected in any way by collate change of the column).
I tried to recreate test cluster and change global collate of the
cluster to C and it helps for both operations, but I can't do this in
the production env.

Tested on:
OS: Gentoo (4.4.0-gentoo-r1) and Fedora (4.5.5-300.fc24.x86_64)
PostgreSQL: v9.5.3 and v9.2.15 (I tried both OS repository PG builds and
manually build from the source with default and optimized
postgresql.conf)

I attached a sample dictionary (new_dic.sql.bz2) and steps to reproduce
(new_dic_log.txt.bz2).

--
Timur Luchkin

Attachment Content-Type Size
new_dic.sql.bz2 application/octet-stream 51.0 KB
new_dic_log.txt.bz2 application/octet-stream 1.0 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-07-05 15:33:09 Re: Too slow "Analyze" for the table with data in Thai language
Previous Message Heikki Rauhala 2016-07-05 11:44:13 Re: "insert [...] on conflict" hangs on conflict on an unmentioned gist index