Re: Full text indexing preformance! (long)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mitch Vincent" <mitch(at)venux(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Full text indexing preformance! (long)
Date: 2000-05-29 06:56:01
Message-ID: 13966.959583361@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Mitch Vincent" <mitch(at)venux(dot)net> writes:
> Without the extra condition :
>
> NOTICE: QUERY PLAN:
>
> Nested Loop (cost=0.00..487951.02 rows=1644 width=204)
> -> Nested Loop (cost=0.00..478489.01 rows=2641 width=12)
> -> Nested Loop (cost=0.00..474081.63 rows=1 width=8)
> -> Seq Scan on resumes_fti rf (cost=0.00..474076.91 rows=1
> width=4)
> -> Index Scan using resumes_oid_index on applicants_resumes
> ar (cost=0.00..4.70 rows=1 width=4)
> -> Index Scan using applicants_states_rstate on applicants_states s
> (cost=0.00..4380.98 rows=2641 width=4)
> -> Index Scan using applicants_app_id on applicants a (cost=0.00..3.57
> rows=1 width=192)
>
> With the extra condition :
>
> NOTICE: QUERY PLAN:
>
> Nested Loop (cost=0.00..474194.76 rows=1 width=208)
> -> Nested Loop (cost=0.00..474085.21 rows=1 width=204)
> -> Nested Loop (cost=0.00..474081.63 rows=1 width=12)
> -> Seq Scan on resumes_fti rf (cost=0.00..474076.91 rows=1
> width=4)
> -> Index Scan using resumes_oid_index on applicants_resumes
> ar (cost=0.00..4.70 rows=1 width=8)
> -> Index Scan using applicants_app_id on applicants a
> (cost=0.00..3.57 rows=1 width=192)
> -> Index Scan using applicants_states_app_id on applicants_states s
> (cost=0.00..109.54 rows=1 width=4)

Odd. The innermost join's the same in both plans, so that's not what's
causing the difference. In the first case the next join is to
applicants_states using the "s.rstate='AL'" clause as a filter with the
applicants_states_rstate index. The planner doesn't think that's gonna
be real selective (note the rows=2641) and based on prior discussion of
your database I'd agree --- don't you have lots of entries for AL?
Then it can at last join to applicants using "s.app_id=a.app_id".

In the second case it's seized on "ar.app_id=a.app_id" as a way to join
"applicants a" to the inner join using the applicants_app_id index.
This is not a bad idea at all if the a.app_id field is unique as it
seems to think (observe rows=1 there). Then finally applicants_states
is joined on its app_id field.

Offhand I'd say that the second plan *ought* to be a lot quicker, and
I don't see why it's not. Is applicants.app_id a unique key, or not?
You could investigate this by running just the partial selects (the
two or three inner tables with just the relevant WHERE clauses) to see
how many rows are returned at each step.

BTW, as far as I can see from this example you're still not using the
FTI stuff properly: you should be querying rf.string ~ '^engineer' so
that you get an indexscan over resumes_fti. Without that, it seems
like you're not really getting any benefit from the FTI structure.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kovacs Zoltan 2000-05-29 09:19:11 Re: ODBC patch
Previous Message Tom Lane 2000-05-29 05:54:16 Re: Proposed cleanup of generated header files