ChiliProject is not maintained anymore. Please be advised that there will be no more updates.

We do not recommend that you setup new ChiliProject instances and we urge all existing users to migrate their data to a maintained system, e.g. Redmine. We will provide a migration script later. In the meantime, you can use the instructions by Christian Daehn.

[PATCH] hiding form pages from search engines (Feature #169)


Added by Yuki Sonoda at 2011-02-10 08:01 am. Updated at 2011-02-17 01:12 am.


Status:Closed Start date:2011-02-10
Priority:Normal Due date:
Assignee:Eric Davis % Done:

0%

Category:User interface
Target version:1.1.0 — Bell
Remote issue URL:http://www.redmine.org/issues/7582 Affected version:

Description

Form pages like /issues/new are not worth to be indexed by search engines. And moreover it is sometimes confusing for visitors from search engine. When you have a question about chiliproject and you search about it, what can you do if /issues/new appears?

It happens when these form pages are opened for anonymous user. It actually happened at redmine.ruby-lang.org once. So I wrote the attached patch. This patch adds a meta element as follows in some pages:

<meta name="ROBOTS" content="NOINDEX,FOLLOW,NOARCHIVE" />

robot_exclusion.patch (2 kB) Yuki Sonoda, 2011-02-10 08:01 am


Associated revisions

Revision 705bd743
Added by Eric Davis at 2011-02-14 03:17 am

[#169] Add a ROBOTS meta tag to several forms to hide from web spiders

Based on the patch by Yuki Sonoda

History

Updated by Felix Schäfer at 2011-02-10 09:52 am

I guess having something like */new in the robots.txt wouldn't work, would it?

Updated by Yuki Sonoda at 2011-02-10 01:07 pm

According to http://www.robotstxt.org/robotstxt.html, robots.txt does not support glob. So we can not expect */new works fine.

Updated by Eric Davis at 2011-02-10 11:36 pm

I think this is a good idea. I'd like to improve on it a little bit though by making the robot_exclusion_tag take options for the content section (e.g. robot_exclusion_tag("NOINDEX,FOLLOW,NOARCHIVE") or robot_exclusion_tag("NOINDEX,NOFOLLOW")). Then we (or plugins) could have more control over the indexing options for each page.

Thoughts?

Updated by Felix Schäfer at 2011-02-11 07:30 am

Eric Davis wrote:

Thoughts?

What about making NOINDEX,FOLLOW,NOARCHIVE the default and calling the method with any collection of (NO)SOMETHING overrides the default for that keyword?

Updated by Eric Davis at 2011-02-11 07:01 pm

This was my idea. It let us have more control of what the actual content is in case the meta tag allows other values later.

1  def robot_exclusion_tag(content="NOINDEX,FOLLOW,NOARCHIVE")
2    "<meta name='ROBOTS' content=#{content} />"    
3  end

Updated by Felix Schäfer at 2011-02-11 09:31 pm

Eric Davis wrote:

This was my idea. It let us have more control of what the actual content is in case the meta tag allows other values later.

No, I meant having NOINDEX,FOLLOW,NOARCHIVE be the default, and if you call it with INDEX to get INDEX,FOLLOW,NOARCHIVE. I just realized that's overengineering it though, I like your proposal :-)

Updated by Eric Davis at 2011-02-11 11:54 pm

Yea I thought about doing keywords too but then we would have to maintain a list of valid ones. Hence the idea of just using a simple string.

I'll add and modify this patch. I think it's minor enough for 1.1.0.

  • Target version set to 1.1.0 — Bell
  • Assignee set to Eric Davis

Updated by Eric Davis at 2011-02-14 02:20 am

I've modified Yuki Sonoda's patch and the code is ready for review.

https://github.com/chiliproject/chiliproject/pull/7

  • Status changed from Open to Ready for review

Updated by Felix Schäfer at 2011-02-14 07:05 am

Looks good to me. I'll merge it by the time I'm around a more stable connection if you haven't done so until then.

Updated by Holger Just at 2011-02-14 09:33 am

I still like Felix' idea of having defaults and being able to gradually overwrite them. This could be done like this:

 1# Add a HTML meta tag to control robots (web spiders)
 2#
 3# @param [optional, String] changed content of the ROBOTS tag.
 4#   defaults to no index, follow, and no archive
 5def robot_exclusion_tag(content="")
 6  default_content = { "INDEX" => "NO",
 7                      "FOLLOW" => "",
 8                      "ARCHIVE" => "NO" }
 9
10  args = content.upcase.split(",").inject({}) do |args, arg|
11    value = arg.gsub(/^(NO)/, "")
12    args[value] = $1 || "" 
13    args
14  end
15  default_content.merge(args).collect{ |k, v| v+k }.join(",")
16end

Updated by Eric Davis at 2011-02-14 11:07 pm

Holger Just wrote:

I still like Felix' idea of having defaults and being able to gradually overwrite them.

Just seems like a lot of code to me that might not be used that often.

Updated by Felix Schäfer at 2011-02-15 06:28 am

Eric Davis wrote:

Just seems like a lot of code to me that might not be used that often.

That's what I meant with "don't overengineer it" ;-) I think the simple version is fine, avoiding to have to write it all out for those few times you need other params is not worth it.

Updated by Eric Davis at 2011-02-17 01:12 am

Merged into master for 1.1.0. Thank you for the patch Yuki Sonoda.

  • Status changed from Ready for review to Closed

Also available in: Atom PDF