@yorkshire-pudding revived #2231 and has provided a PR for it that I started reviewing. It didn't take long to find out that our current methods of trying to parse and replace variables in settings.php files aren't robust enough to support certain things such as variable declarations that span across multiple lines. The regex's that we are using are making too many assumptions, which may not always be true, some of the following ones being the most obvious ones:

  • variables do start with a $, however that might not begin at the start of the line (add either one or multiple tabs/spaces might be added before it as indentation, and that would still be valid PHP despite the regex not being able to get a match on it).
  • variable values do end before the last ; on the line, however that might not be the last character in the line (there might be tabs/spaces that follow it for instance, and it would still be valid PHP).
  • although our default settings.php file has all its variables and their respective values on the same line, that does not necessarily need to be the case for any php file to be valid. For example this: php $var = 'some value'; ...is also absolutely valid PHP if formatted like this (although against our coding standards): php $var = 'some value'; And more specifically related to #2231, while before we had this (variable defined in a single line): php $database = 'mysql://user:pass@localhost/database_name'; ...we now want to have this (the value of the variable is an array, formatted to span across multiple lines): php $database = array( 'database' => 'database_name', 'username' => 'user', 'password' => 'pass', 'host' => 'localhost', ); And although we could have the regex parse multiple lines until it finds the first occurrence of );, we cannot be absolutely sure that a randomly generated password does not include these characters. Consider this: php $database = array( 'database' => 'database_name', 'username' => 'user', 'password' => 'pa);ss', 'host' => 'localhost', ); ...and even if we adjusted the regex to be "greedy" or tell it that we expect the ); to be at the end of a line (before a line break), people might still do something like this: php $database = array( 'database' => 'database_name', 'username' => 'user', 'password' => 'pa);ss', 'host' => 'localhost', ); // Here's a comment where you didn't expect it. Yes, perhaps we could "flatten" that multi-line variable value into a single line when parsing it, like so: php $database = array('database' => 'database_name', 'username' => 'user', 'password' => 'pa);ss', 'host' => 'localhost'); ...if we did that, then we could instruct the regex to parse everything up to the last ); in that line, however we cannot be sure that people will not be adding comments in their code. Consider this for example: php $database = array( // Some important note about this database connection. 'database' => 'database_name', 'username' => 'user', 'password' => 'pa);ss', 'host' => 'localhost', ); ...if we attempted to "flatten" the above code into a single line, then we'd end up with $database = array( // Some important note ..., which would break things. So we would need to strip comments out, and if we did that using regex, we'd have to make sure that we are accounting for // or /* occurring outside a string that might be otherwise legitimate (think 'password' => 'pa/*ss',). Only imagination is the limit on what can happen and how we need to account for the various possibilities.

Anyway, the point is that a) we would end up with some very complex (and thus easy to break) regex's, and b) even then there are so many things that can go wrong with the assumptions we are making in backdrop_rewrite_settings(), since people might be manually editing their settings files in unpredictable ways. So I did some research trying to find a php parser that we could use, and luckily I came across token_get_all(), which is a native PHP function that uses the in-built Zend engine's lexical scanner, so: - no need for any additional libraries - available since PHP v4.2.0 and all through recent versions of 5/7/8 - does most of the things we are after for free - does it in the same manner that PHP would native parse the code

I starting experimenting with it, and it looks very promising!

GitHub Issue #: 
6297