Re: no mailing list hits in google

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: no mailing list hits in google
Date: 2019-08-29 11:12:00
Message-ID: CABUevEz2FnSi-mm1e9pzOMR16OGCBLU3C69kRU_CX2DK0_SoqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Wed, Aug 28, 2019 at 7:45 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2019-08-28 19:09:40 +0200, Magnus Hagander wrote:
> > It blocks /list/ which has the subjects only.
>
> Yea. But there's no way to actually get to all the individual messages
> without /list/? Sure, some will be linked to from somewhere else, but
> without the content below /list/, most won't be reached?
>

That is indeed a good point. But it has been that way for many years, so
something must've changed. We last modified this in 2013....

Maybe Google used to load the pages under /list/ and crawl them for links
but just not include the actual pages in the index or something

I wonder if we can inject these into Google using a sitemap. I think that
should work -- will need some investigation on exactly how to do it, as
sitemaps also have individual restrictions on the number of urls per file,
and we do have quite a few messages.

Why is that /list/ exclusion there in the first place?
>

Because there are basically infinite number of pages in that space, due to
the fact that you can pick an arbitrary point in time to view from.

> Nothing has been changed around that for many years from *our* side.
>
> Any chance that there previously still was an archives.postgresql.org
> view or such that allowed to reach the individual messages without being
> blocked by robots.txt?
>

That one had a robots.txt blocking this going back even further in time.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-08-29 11:13:33 Re: no mailing list hits in google
Previous Message Ibrar Ahmed 2019-08-29 10:47:40 Re: pg_get_databasebyid(oid)

Browse pgsql-www by date

  From Date Subject
Next Message Magnus Hagander 2019-08-29 11:13:33 Re: no mailing list hits in google
Previous Message Alvaro Herrera 2019-08-28 18:31:51 Re: no mailing list hits in google