Query Planner Issue - View and regexp_replace?

From: Brandon Kane <brandonk98(at)yahoo(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Query Planner Issue - View and regexp_replace?
Date: 2008-02-17 23:05:13
Message-ID: 792368.9791.qm@web51703.mail.re2.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

I'm not entirely sure where to ask this question, or what to include, so I'm hoping folks here can
help me out with that.

Basically, I have a query that joins a table to a view. The view contains a CASE statement with a
regexp_replace call. The query plan used involves a sequential scan on the table used in the
view. However, when I run directly against the table, or include the view's SQL in my query
directly, I (correctly) get a nested loop scan with index scans. When I remove the regexp_replace
clause from my CASE statement in the view, then the original query begins to use the "correct"
query plan.

I'm able to reproduce this behavior with simple test tables and absolutely no test data loaded, so
I'm including the simple reproduction case below, with all the SQL.

Any help in directing this into the correct format and to the correct place is appreciated.

Thanks,
-Brandon

-- Server Version String
-- PostgreSQL 8.2.5 on i386-redhat-linux-gnu,
-- compiled by GCC gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12)

--
-- Main Table
--
CREATE TABLE test_table
(
id character(12) NOT NULL,
data character varying(20),
CONSTRAINT "PK_test_table" PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE test_table OWNER TO postgres;

--
-- Join Table - Filtering done here, then joined to main table
--
CREATE TABLE test_join_table
(
joinid integer NOT NULL,
refid character(12) NOT NULL,
data integer NOT NULL,
CONSTRAINT "PK_join_test" PRIMARY KEY (joinid),
CONSTRAINT "FK_join_test_table" FOREIGN KEY (refid)
REFERENCES test_table (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (OIDS=FALSE);
ALTER TABLE test_join_table OWNER TO postgres;

CREATE INDEX "IDX_join_data"
ON test_join_table
USING btree
(data);

--
-- View on Main Table. Used in join
--
CREATE OR REPLACE VIEW test_view AS
SELECT test_table.id, test_table.data,
CASE
WHEN btrim(test_table.data::text) = ''::text THEN '0'::text::bpchar::character varying
-- Removing this condition from CASE statement in view corrects query planner problem.
WHEN NOT isnumeric(test_table.data::text) THEN
regexp_replace(
test_table.data::text,
'[^0-9]'::text,
''::text,
'g'::text)::bpchar::character varying
--
ELSE test_table.data
END AS datafixed
FROM test_table;

ALTER TABLE test_view OWNER TO postgres;

--
-- Query #1 - This returns the correct query plan. View is not involved.
--
select tt.* FROM test_table tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4

--
-- Query #2 - This is the problem query. It fails to return a plan involving a nested loop
-- with the two tables. If the view is modified as mentioned above, this does return the
-- correct query plan.
--
select tt.* FROM test_view tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4

--
-- Query #3 - Interestingly, this returns the correct query plan. I've taken the view SQL, and
-- merged it into the larger query, and it seems to return the correct query plan, even with
-- the problematic line in the CASE statement.
--
SELECT tt.id, tt.data,
CASE
WHEN btrim(tt.data::text) = ''::text THEN '0'::text::bpchar::character varying
WHEN NOT isnumeric(tt.data::text) THEN
regexp_replace(
tt.data::text,
'[^0-9]'::text,
''::text,
'g'::text)::bpchar::character varying
ELSE tt.data
END AS datafixed
FROM test_table tt
INNER JOIN test_join_table jt
ON tt.id = jt.refid
AND jt.joinid = 3 AND jt.data = 4

-- This query plan discrepancy occurs with absolutely no data loaded in the tables, as well as in
-- the actual tables with actual data loaded. This suggests that the problem is not related to
-- statistics. Since the field being operated in the case statement is not an index/search or
-- join column, I can't figure out why it affects the query plan. I can't seem to find anything
-- on this through Google.
-- I expect that the query planner will return a nested loop scan with the join table being on
-- the outer loop. In my practical scenario, this is the most efficient query, since the join
-- table filtering will remove most of the rows. What I find in query #2 is that instead a
-- hash/hash join is performed, which does a sequential scan on the main table. This is what
-- I'm trying to avoid.

____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2008-02-17 23:50:04 Re: Query Planner Issue - View and regexp_replace?
Previous Message Christian Stalp 2008-02-17 19:54:49 Re: Problem while inserting a varchar