Re: once more: documentation search indexing

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <rob(at)xzilla(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Michael Christofides <michael(at)pgmustard(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: once more: documentation search indexing
Date: 2022-04-19 13:17:57
Message-ID: CABUevEyBmgBu=2Pv8r8i7=rf7iLJjyLjQ5nG54MMNe_kmTcVog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

On Tue, Apr 19, 2022 at 11:18 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

> > On 18 Apr 2022, at 20:04, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> >> What would be the actual *advantage* of excluding them?
> >
> > The immediate problem is that Google is still preferentially returning
> old
> > pages in some cases, e.g. top hit for "postgres gist gin index" is still
> >
> > https://www.postgresql.org/docs/9.1/textsearch-indexes.html
> >
> > Now maybe that just means they've not completely reindexed since we made
> > the canonical-version change, so I'm content to wait awhile longer
> > before concluding that that change wasn't sufficient. But we should be
> > considering the possibility that it wasn't.
>
> That particular 9.1 page is the second hit for "postgres gin index" after
> the
> /current/ page for the Gin Index chapter. (I first thought it was the
> first
> hit since I dismissed the "featured snippet" result as an ad.) DuckDuckGo
> returns the 9.1 page or the current page seemingly at random for "postgres
> gin
> gist index".
>
> Searching for "postgres gist gin index <version>" on Google returns the
> correct
> page for versions 8.3 through 9.4, for any other version (including lower)
> it
> returns /current/.
>

This seems to indicate it just hasn't picked that up yet? That's the
bahaviour we saw before it found the rel=canonical parts, isn't it?

Removing the old content might improve search results, but it might also
> just
> remove it altogether bumping non-postgresql.org content higher.
>

Yeah, if we remove them completely then presumably they also stop counting
as "link score" for us.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Daniel Gustafsson 2022-04-19 19:00:42 Re: once more: documentation search indexing
Previous Message Simon Riggs 2022-04-19 12:14:30 New book