i am wondering if anyone else has seen this. the site below loads fine in any browser i have tried. all urls, that is.
however, any command line client (wget, lynx, googlebot, ...) accessing the site generates a 403 for any url other than the home page url.
so, the site home page is https://coin.monitorbankrates.com. that page loads fine in any browser i have tried, as well as any cli client.
but any url like this: https://coin.monitorbankrates.com/coin/tether loads fine in all browsers, but generates a 403 in any cli client.
i have looked at both the robots.txt and the .htaccess files, and i see nothing that i think would cause this.
when i examine the request/response in firefox web developer (network), i can see that there is initially a 403, but the page still loads for some reason.
there is no longer any .htpasswd protection, and best i can tell, all files and directories have correct permissions.
i am stumped. i will of course have to step through all the code to see if anything i have done might cause it. i was just hoping someone else had seen this issue and could give me a quick pointer.
Comments
update: turns out that the urls do in fact return 403s to gui browsers, but they also return all the page resources, and the page is correctly displayed. weird.
Not seeing this in Chrome network tab but didnt do much testing. And I dont know much about these things.
i may have found the/an issue.
on every one of the nodes that is returning a 403, the node table has a 1 for the status of the node. but when i open it in the node edit form, the status is "draft". these nodes were all programmatically created, and i set the status to NODE_PUBLISHED on creation. interesting that the db value is 1, but the editor sees it as draft. when i change the status in the editor and save it as published, the 403 goes away.
so the problem seems to be that all my published nodes are somehow seen by backdrop as draft. weird. updating the node table in the db does not seem to do the trick. i will write a script to do a node_load and then a node_save, and see if that sets things right.
this sounds like it should be impossible. What is the scheduled status in the database?
get this: i just applied the current security update, and that problem (inconsistent status) seems to have gone away. now the published status seems to be correct, and seen correctly by the editor.
not sure if this started with the last security update (less than 2 weeks ago).
bad update:
after having applied the update last night and seeing the problem go away, it is back this morning, with no other changes.
when i view the content listing at https://coin.monitorbankrates.com/admin/content/node?status1&type=coin, i see coins that are published. when i edit one of them, they come up with draft selected.
time to start tracing through code...
the images above show what i am describing for a single coin.
just as a followup:
i wrote a script to set all nodes to published using node_load() and node_save(). when i debug a node after calling node_load(), i can see that status = 0, even though it shows up in the content manager list as published. i set it to 1, do a node_save(), and it appears to be correct.
so some code somewhere is setting node->status to 0. i have grepped the hell out of my code, and i do not see it happening. so i have to suspect that it is backdrop code. but this would be a huge bug, which i have to assume would have long since been reported. bizarre.
any ideas to point me in the right direction would be greatly appreciated.
The content list is a View, so it probably is doing a direct database query rather than a node load. Somewhere there is probably a
hook_entity_load()
, hook_load() or a hook_node_load() thats setting status to 0. You have contrib modules installed?thanks for the response.
the only modules that did not get installed with the installation are 3 custom modules i wrote. the only hooks i implement are hook_preprocess_page() and hook_preprocess_layout(). i have verified that there are no updates whatsoever happening in those functions.
should i implement my own hook_node_load() and set my module to be the last loaded? i used to do that in the system table in drupal, but i guess i would do it in the config in backdrop?
I assume you mean there are NO contrib modules, only those 3 custom ones?
Anyway, if these nodes are loading as unpublished, they should still 403, and not be visible to anyone. I'm viewing https://coin.monitorbankrates.com/coin/tether now without any problem.
But just to note the URL you posted had some characteers on the end https://coin.monitorbankrates.com/coin/tether%C2%A0. I'm assuming that isnt intentional?
correct, no contrib, only my 3 custom.
the coin pages are not returning a 403 because i ran my script to set them all to published using node_load() and then node_save(). i will be watching to see if/when they change back.
the extra chars in the url just got put there when i pasted into the comment here, maybe how i copied from the browser address bar.
this is also an issue on 2 other backdrop sites we run. really bizarre. the issue only seems to occur on custom content types.
this would appear to be a fairly serious bug, if it is in fact a backdrop bug. and so far, everything i can test indicates that it is in the backdrop code, as doc mentioned maybe in hook_entity_load or hook_node_load. the database says the status is 1, but doing a node_load gives a status of 0. ouch.
ok, i think i found the issue, though not really the cause.
node_load() must pull the status value from node_revisions instead of from node (not sure why that would be). i have revisions off for all of those content types, but somehow, the status column in the node_revisions was set to 0, even though the status column in the node table was set to 1.
that explains why my earlier script to update with node_save() - which no doubt updates both node and node_revisions - worked, for the one site at least. the other sites have hundreds of thousands of nodes, and it was taking forever to work through them with the script. anyway, a mass update to the node_revisions table i think solved the 403 problem (still confirming that).
if i had looked at the node_load() source (and i guess any other implementers of the hook), i'd have probably found the issue quicker.