Re: BUG #15475: Views over CITEXT columns return no data

From: Paul Schaap <ps(at)ipggroup(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15475: Views over CITEXT columns return no data
Date: 2018-11-01 23:26:57
Message-ID: FF0E68A0-3266-48CF-BE8E-B2E03AB3EE9F@ipggroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Andrew,

I agree it seems implausible to me too however it is 100% reproducible, see below track of it happening, I have changed the names of things to protect the innocent ;-) :

my_server.rds.amazonaws.com = Postgres 10.5 RDS
Local Postgres 11.0 on OSX, also tried on Ubuntu with the same data and it exhibited the same issue

$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE: drop cascades to view my_events
DROP TABLE

$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup -Fc | pg_restore -d my_databasename
Password:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 225; 1259 27069 TABLE my_lookup my_username
pg_restore: [archiver (db)] could not execute query: ERROR: role "my_username" does not exist
Command was: ALTER TABLE public.my_lookup OWNER TO my_username;

WARNING: errors ignored on restore: 1

$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW

HERE IS THE STRANGE ANSWER:

$ psql -c "SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
count
-------
0
(1 row)

BUT HANG ON WE DO GET DATA IF I LIMIT:

$ psql -c "SELECT *
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS') LIMIT 10;" my_databasename

<snip>

(10 rows)

OK LETS RELOAD USING A SQL METHOD:

$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE: drop cascades to view my_events
DROP TABLE

$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup | psql -d my_databasename
Password:
SET
SET
SET
SET
SET
set_config
------------

(1 row)

SET
SET
SET
SET
SET
CREATE TABLE
ERROR: role "my_username" does not exist
COPY 355109
ALTER TABLE
CREATE INDEX

JUST SO YOU KNOW HOW MANY my_events THERE ARE AS THIS RELIES ON SOME SCALE:

$ psql -c "SELECT COUNT(*) FROM my_events_raw;" my_databasename
count
-----------
162430247
(1 row)

$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW

BINGO, THE RIGHT ANSWER:

$ psql -c "SELECT COUNT(*)
FROM my_events
WHERE my_date >= '2018-08-01'
AND my_date < '2018-09-01'
AND type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
count
--------
168878
(1 row)

The only other clues I have for you are :
my_events_raw is partitioned by month, e.g. FOR VALUES FROM ('2018-02-01 00:00:00+00' ) TO ('2018-03-01 00:00:00+00');
that if I switch the my_citext from CITEXT to TEXT column magically works again, and switching it back again to CITEXT breaks it again
That if I reload the table using any non -Fc method it all works, also no matter if I drop or truncate and load the table
There are no NULL my_int's, there are NULL my_citext's
That if I reverse my_citext::INT = my_int to my_citext = my_int::CITEXT it also works correctly
Is there perhaps any other logging or log level I could set that may help?

I know this is baffling and is a weird edge case, and there are plenty of work arounds that I found, but maybe it hints at another issue so seems worth investigating especially for a .0 release.

CHEERS
Paul

> On 2 Nov 2018, at 09:26, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>
>>>>>> "Paul" == Paul Schaap <ps(at)ipggroup(dot)com> writes:
>
> Paul> Hi Andrew,
>
> Paul> Yes there are indexes on both tables, and the issue only shows up
> Paul> with high volumes.
>
> Does it show up with low volumes too if you do set enable_seqscan=false;
>
> Paul> Note I have subsequently isolated the issue down to a pg_dump 10
> Paul> to pg_restore 11 using -Fc.
>
> No, I don't think you have. In particular, it's not plausible that using
> -Fc makes any substantive difference, since the commands run for the
> restore will not be any different to restoring a plain-text dump.
>
> --
> Andrew.

--

PRIVACY & CONFIDENTIALITY NOTICE

The information contained in this
email is intended for the named recipients  only.  It may contain
privileged and confidential information and if you are  not the intended
recipient, you should not copy it or disclose its contents  to any other
person, or take any action in reliance on it.  If you have  received this
email in error, please notify us immediately by return email.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2018-11-01 23:32:43 Re: BUG #15475: Views over CITEXT columns return no data
Previous Message Andrew Gierth 2018-11-01 22:26:24 Re: BUG #15475: Views over CITEXT columns return no data