More thoughts on sorting

From: PFC <lists(at)peufeu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: More thoughts on sorting
Date: 2009-07-31 12:41:04
Message-ID: op.uxxmuqgncigqcu@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


There was a thread some time ago about sorting... it kind of died...
I did some tests on a desktop (Postgres 8.3.7, kubuntu, Core 2 dual core,
4GB RAM, RAID1 of 2 SATA disks)

Quick conclusions :

- grabbing the stuff to sort can be IO bound of course (not here)
- for short strings (average 12 bytes), sort is CPU-bound in strcoll()
- for longer strings (average 120 bytes), sort is even more CPU-bound in
strcoll()
- strcoll() time seems to depend on the length of the strings, not the
place where a difference occurs (grokking glibc source code confirms)

See detailed test procedure below.

Locale is fr_FR.UTF-8 and database is UNICODE
All strings are ASCII, they are mostly alphanumeric references.
There are 391469 strings.
min length 6 chars
max length 80 chars
avg length 11.82 chars

We have a table test with (id INTEGER PRIMARY KEY, TEXT, BYTEA ), and
contents of TEXT and BYTEA are identical.
We have a table test2 which contains the same thing as test, except the id
and a 100-character constant are appended to the strings to make them
longer.

Test Procedure :

Grab test data from :
http://home.peufeu.com/pg/test_data_references.copy.gz

**** Sorting with Python

Sorting all string converted to unicode (from utf8) using strcoll() and
correct locale
=> 5.8 s

With longer strings (as in table test2 below )
=> 8 s

**** Postgres

To get query timings, I use \t and "SELECT * FROM test ORDER BY id OFFSET
391468;" which avoids EXPLAIN ANALYZE overhead, it just prints the last
row from the results. Timings are a bit shorter than EXPLAIN ANALYZE
gives, and I checked the plans, they are all sorts.

-- Create test table and load it
BEGIN;
CREATE TABLE test1 (t TEXT NOT NULL);
\copy test1 FROM test_data_references.copy
CREATE TABLE test (id SERIAL PRIMARY KEY, t TEXT NOT NULL, b BYTEA NOT
NULL );
INSERT INTO test (t,b) SELECT t,t::BYTEA FROM test1;
DROP TABLE test1;
ALTER TABLE test DROP CONSTRAINT test_pkey;
CREATE TABLE test2 (id INTEGER NOT NULL, t TEXT NOT NULL, b BYTEA NOT NULL
);
INSERT INTO test2 SELECT id,
(t || id || 'This is a dummy text of length 100
bytes************************************************************') AS t,
(t || id || 'This is a dummy text of length 100
bytes************************************************************')::BYTEA
AS b
FROM test;
COMMIT;

\d test

SHOW work_mem;
--> 16MB
SHOW maintenance_work_mem;
--> 512MB

\timing
-- cache it really well
SELECT count(*) FROM test;
SELECT count(*) FROM test;
SELECT count(*) FROM test;
--> 391469
--> Temps : 87,033 ms

SELECT * FROM test ORDER BY id OFFSET 391468;
--> Temps : 918,893 ms

SELECT id FROM test ORDER BY id OFFSET 391468;
--> Temps : 948,015 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is
not significant.
- Sorting by integers is quite fast (not THAT fast though, MySQL below is
3x faster when selecting just 'id' and 2x slower when SELECT *, hum.)

SELECT * FROM test ORDER BY b OFFSET 391468;
--> Temps : 2145,555 ms

SELECT id FROM test ORDER BY b OFFSET 391468;
--> Temps : 2152,273 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is
not significant.
- Sorting by BYTEA is just a memcmp(), it is strange that is it 2x slower
than ints. Probably the varlena stuff, I guess.
- See ridiculous MySQL results using a BLOB below which are 10x slower

SELECT * FROM test ORDER BY t OFFSET 391468;
--> Temps : 7305,373 ms

SELECT id FROM test ORDER BY t OFFSET 391468;
--> Temps : 7345,234 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) is
not significant.
- Sorting localized TEXT really is SLOW !!!!!
- The little test above calling strcoll() from Python confirms the
slowness is in strcoll()
- MySQL (see below) seems to be much faster (about equal to postgres) on
VARCHAR, and 2x slower on TEXT (hum...)

BEGIN;
CREATE INDEX test_id ON test( id );
--> Temps : 555,718 ms

CREATE INDEX test_b ON test( b );
--> Temps : 1762,263 ms

CREATE INDEX test_t ON test( t );
--> Temps : 6274,624 ms

ROLLBACK;

Interpretation :
- maintenance_work_mem is much higher than work_mem so the sorts are
faster, but the slowness in localized text sorting subsists...

SELECT count(*) FROM test2;
--> 391469
--> Temps : 114,669 ms

SELECT * FROM test2 ORDER BY id OFFSET 391468;
--> Temps : 1788,246 ms

SELECT id FROM test2 ORDER BY id OFFSET 391468;
--> Temps : 989,238 ms

Interpretation :
- Time for hauling around extra data (SELECT * instead of SELECT id) IS
significant this time due to the extra string lengths.

SELECT * FROM test2 ORDER BY b OFFSET 391468;
--> Temps : 2906,108 ms

SELECT id FROM test2 ORDER BY b OFFSET 391468;
--> Temps : 2554,931 ms

SELECT * FROM test2 ORDER BY t OFFSET 391468;
--> Temps : 10637,649 ms

SELECT id FROM test2 ORDER BY t OFFSET 391468;
--> Temps : 10322,480 ms

Interpretation :
- Note : the strings are longer, however they are sortable only by looking
at the first few chars.
- As shown by the bytea sort above, hauling around the longer strings does
take a little bit more time, perhaps enough to justify an extra second
over the short strings, but not to justify 3 more seconds.
- So, strcoll() again.

SELECT * FROM test2 ORDER BY t::BYTEA OFFSET 391468;
--> Temps : 4406,958 ms

Interpretation :
- Conversion of TEXT to BYTEA and memcmp() is much faster than strcoll()

BEGIN;
CREATE INDEX test2_id ON test2( id );
--> 586,790 ms

CREATE INDEX test2_b ON test2( b );
--> 3011,777 ms

CREATE INDEX test2_t ON test2( t );
--> 8970,160 ms

ROLLBACK;

Now let's check... MySQL 5.0.51a-3ubuntu5.4 !

CREATE TABLE test1 (t TEXT NOT NULL);
LOAD DATA INFILE '/tmp/test_data_references.copy' INTO TABLE test1;
CREATE TABLE test (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, t TEXT
NOT NULL, c VARCHAR(255) NOT NULL, b BLOB NOT NULL );
INSERT INTO test (t,c,b) SELECT t,t,t FROM test1;
DROP TABLE test1;
ALTER TABLE `test` CHANGE `id` `id` INT( 11 ) NOT NULL;
ALTER TABLE test DROP PRIMARY KEY;
CREATE TABLE test2 (id INTEGER NOT NULL, t TEXT NOT NULL, c VARCHAR(255)
NOT NULL, b BLOB NOT NULL );
INSERT INTO test2 SELECT id,
concat(t, id, 'This is a dummy text of length 100
bytes************************************************************'),
concat(t, id, 'This is a dummy text of length 100
bytes************************************************************'),
concat(t, id, 'This is a dummy text of length 100
bytes************************************************************')
FROM test;

-- give it same sort buffer as pg had
SET sort_buffer_size=16777216;

SELECT * FROM test ORDER BY id LIMIT 391468,391468;
--> 1.87 sec
SELECT id FROM test ORDER BY id LIMIT 391468,391468;
--> 0.37 sec

SELECT * FROM test ORDER BY t LIMIT 391468,391468;
--> 14.67 sec
SELECT id FROM test ORDER BY t LIMIT 391468,391468;
--> 14.73 sec

SELECT * FROM test ORDER BY c LIMIT 391468,391468;
--> 6.22 sec
SELECT id FROM test ORDER BY c LIMIT 391468,391468;
--> 5.37 sec

SELECT * FROM test ORDER BY b LIMIT 391468,391468;
--> 20.77 sec
SELECT id FROM test ORDER BY b LIMIT 391468,391468;
--> 21.12 sec

-- now table test2

SELECT * FROM test2 ORDER BY id LIMIT 391468,391468;
--> 2.02 sec
SELECT id FROM test2 ORDER BY id LIMIT 391468,391468;
--> 0.47 sec

SELECT * FROM test2 ORDER BY t LIMIT 391468,391468;
--> 15.45 sec
SELECT id FROM test2 ORDER BY t LIMIT 391468,391468;
--> 14.81 sec

SELECT * FROM test2 ORDER BY c LIMIT 391468,391468;
--> 6.01 sec
SELECT id FROM test2 ORDER BY c LIMIT 391468,391468;
--> 6.53 sec

SELECT * FROM test2 ORDER BY b LIMIT 391468,391468;
--> 20.33 sec
SELECT id FROM test2 ORDER BY b LIMIT 391468,391468;
--> 20.84 sec

Ideas :

In this case, and probably in much other use cases, the strings are
references, simple short alphanumeric strings, that contain only ASCII
character codes.
strcoll() is overkill.

Many other things only use ASCII : emails, urls, product codes, zipcodes,
etc.

- implement per-column locales

Set the relevant columns to US_ASCII
Problem, per-column locales are definitely not easy to implement, and
besides, it seems like too much generalization.
Since a table could contain text entries for many countries, how to you
decide on the locale to use for the usernames ?
Besides, if you go to the end of this line of thought, you'll want to sort
according to the current user's locale, and this doesn't solve it at all.

- new TEXT types

There is citext, the case-insensitive text...

Perhaps there should be :

* Type ASCII : ASCII TEXT.
- Behaves exactly the same as TEXT, except :
- Restricted on input to only ASCII values, or maybe a subset of ASCII
- Convertible from TEXT (with errors if it encounters an illegal char)
- Convertible to TEXT
- Sorted using simple fast locale-agnostic case-insensitive comparison

Performance gains :

As seen above, a big speedup on sorting : => 3.5x faster on 400K small
strings.
But also, comparing TEXT to BYTEA here, which gives the same results
because all strings are actually lowercase :

select count(*) from test where t >= 'a' AND t < 'b' ;
=> returns 32K rows
=> bytea is 1.6x faster without an index
=> also 1.6x speedup with an index (Bitmap Index Scan) because of the
'Recheck Cond' which hits on strcoll() again
(same with BETWEEN)

select count(*) from test where t >= 'annonce_pap_6' AND t <
'annonce_pap_63';
=> returns 163 rows
=> bytea is 1.6 / 1.7x faster on index scan

* a little JOIN test with indexes on a and b

- 1000 rows (uses nested loop + index scan)
SELECT count(b.id) FROM test a JOIN test b ON (a.t=b.t) WHERE a.id <= 1000;
=> bytea is 2.3-2.5x faster

- 100K rows (index scan on id, sort on a.t, merge join with index a.t)
SELECT count(b.id) FROM test a JOIN test b ON (a.t=b.t) WHERE a.id <=
100000;
=> bytea is 3x faster

- full table JOIN (uses Merge Join using the index)
SELECT * FROM test a JOIN test b ON (a.t=b.t)
=> bytea is 2x faster

I tested on other types of queries, basically using BYTEA instead of TEXT
gives a 1.5-2.5x speedup on
- < > <= >= BETWEEN
- all index operations
- JOINs on TEXT columns
- sorts (even better speedup)

Since case-insensitive comparison of ASCII strings is an extremely fast
and simple operation, I don't think an ASCII text type would be slower
than a BYTEA...

Switching the column in MySQL from utf8_general_ci to ascii_general_ci
makes it 2x faster on Sorts.

MySQL on Joins :
SELECT count(b.id) FROM test a JOIN test b ON (a.t=b.t) WHERE a.id <= 1000;
SELECT count(b.id) FROM test a JOIN test b ON (a.t=b.t) WHERE a.id <=
100000;
Here MySQL is 10x slower than postgres (ahem) so I guess the string
encoding is the least of its worries...

Thoughts ?

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Field 2009-07-31 12:41:46 Re: 8.4 win32 shared memory patch
Previous Message Boszormenyi Zoltan 2009-07-31 09:42:33 Re: ECPG support for struct in INTO list