For some time now I have been getting recent log messages in the form:

Page not found : "http://www.toysnz.com/http://www.toysnz.com/CorgiC675-7_Metrobus-Hitachi-LE"

Page not found: "http://toysnz.com/http://toysnz.com/Benbros_ArticulatedRemovalsLorry_UnitedDairies"

[Note the double use of http://sitename]

A number of these appear to be coming from IPs assigned to LACNIC and RIPE.

Is this some sort of attempted exploit or a mis-configured crawler?

I've been trying to minimise the log entries by setting a rewrite in .htaccess but haven't found a way of dealing with the double http:// request that rewrites it as a single http://

Most helpful answers

No problem, and sorry I can't be more helpful. Other people with better knowledge of .htaccess rules may be able to help.

Comments

Is this "toysnz.com" your domain? If not, perhaps it's a ridiculous way to post spam in the error log? If it's your domain, perhaps there are some links in your content or somewhere else that are misconfigured - links that are supposed to be external, that are being interpreted as internal by Backdrop and therefore result in that double use. 

Yes - toysnz.com is my domain name. Since the double use of the domain name is only coming from RIPE and LACNIC associated ips made me think immediately of an exploit, but my research did not come up with anyone experiencing anything similar. Secondly, none of the search engines are making such a request so I figure my internal/external links are all OK and it might be a mis-configured crawler.

That's why I went towards a rewrite that could handle the double http:// and send requests to the correct single http:// address, but I can't work out the correct format for it. I've tried variations of:

 RewriteRule ^/http://toysnz.com/(.*)$ $1 [R=301,NC,L]

with and without the leading slash, with and without the leading (1st) http request, without success.

I would say... just do not bother to try to fix this. Let the "not found" errors be in the log. The log is cleared regularly, so as long as you are not getting a ton of these "not found" errors, I don't see why you should worry about this.

Thanks for taking the time to respond (again).

Trouble is I'm getting a lot of them - but I'll leave things alone for now and monitor the success of my other rewrites related to google's searching for .jpg files that are no longer associated with my old D7 site (aka toysnz.com).

No problem, and sorry I can't be more helpful. Other people with better knowledge of .htaccess rules may be able to help.