just kind of curious:

why did you all decide to move various data out from db tables into config files in the filesystem?

Most helpful answers

First, I do think it would be sensible to validate that a vocabulary exists as part of `taxonomy_term_save()`, I think that suggestion would merit a core issue (if you are up for creating one on GitHub?)

However, the location the vocabulary configuration data doesn't necessarily have anything to do with the solution to that problem.

In Backdrop (or even in Drupal, when the configuration was saved in the database) the database itself was never leveraged to do this kind of validation. Even with config in the database, we would be in exactly the same situation we are in now (but without all the benefits of having deployable configuration).

With a complex relational database like in Backdrop/Drupal the relationships are not defined by the database itself, but by the code that uses the data. The code will always be necessary to validate these relationships.

There are certainly other types of systems where the relationships can be defined and managed by the database itself, but unfortunately that's not what were working with here. 

if i was designing a database from scratch, yeah, i probably would have a node_type table. and for sure in the logical model i would. i could no doubt be convinced that for a given system, it might make sense to leave it out of the physical db structure and store it in a file. especially if it was a large system (in data size and/or transaction volume) where every millisecond counts and you have good file caching that substantially outperforms the database caching. in my former life, database performance was my obsession.

but i also heard what jen said in that post that was linked in the first comment. i have deep appreciation for configuration management. but i am not very familiar with what that really means in the current software (web) development world.

edit: quick clarification on the node_type example. if "node_type" was literally nothing more than a discriminator data element, i would of course not have a separate table for it. but if it had any other attributes associated with it - eg, descriptive attributes, searchability, default template, preprocess function, etc - then it would warrant a table in my mind.

i know i have a somewhat antiquated view of how the database fits into the overall system architecture, relative to modern web development. i don't assume it is correct.

yes, i understand your perspective on that. and mine is no doubt outdated in the world of web apps and "big data". i do tend to start from a somewhat "purist" perspective (eg, a logical data model), and back off from there as needed (eg, into a de-normalized physical data model).

i guess with my long ago consulting background, when i see this: "A vocabulary simply defines a type of taxonomy term" - i see the "defines" as cause for making it a relevant data element in the model. there is surely a gray area around what can be considered simply an attribute vs a key.

Comments

The short answer is that this is the direction Drupal development was moving towards for years. Drupal 8 also has configuration in the filesystem (it's a little bit more complicated but that's not important now). Backdrop CMS is an early fork of Drupal 8, back when Drupal 8 looked a lot like Drupal 7.

A longer answer: developers have for years used Features in Drupal 7 in order to start tracking configuration in version control and make it easier to manage configuration between environments. But Features was not meant to do all that work and there were always some things that were difficult to track in Features. So devs started building in a better configuration system.

Some more reading: https://github.com/backdrop/backdrop-issues/issues/169

thanks for the info, and the link.

i guess i find it interesting what backdrop considers "configuration" vs "content".

the taxonomic system has a simple, hierarchical data model. backdrop has taken the top level of that model and moved it out of the database, making it literally impossible for the database server to maintain referential integrity. at least that is how it looks from a database perspective.

i have noticed over the years, as web applications became ubiquitous, that overall systems design has distanced itself from some core principles of distributed application development back in my day, primarily in how the database fits into the overall architecture. i am a dinosaur, i realize that. :-)

 

drop's picture

the taxonomic system has a simple, hierarchical data model. backdrop has taken the top level of that model and moved it out of the database...

Well, that's not quite right. The simple, hierarchical data model is still there. The only thing that has moved out of the database is the name and description of each distinct hierarchy, along with any configurations that describe how each one should behave. All of the items within each hierarchy are still stored in the database, the same as they were before. 

i find it interesting what backdrop considers "configuration" vs "content".

In general, configurations like vocabularies have been moved into files so that developers can make changes to them safely in a separate environment, and then later deploy the changes without them interfering with any of the content that's stored in the database (content, which might be changing while the developer is doing their work).

Sometimes the distinction can seem a bit arbitrary. The decisions made in Backdrop are ones that had evolved from the years of developers working on Drupal and deciding X is content, and Y is configuration. There's no saying we got them all right! Fortunately Backdrop is evolving too, and we can adjust as needed as we go.

i hear you, and appreciate your explanation. it was a practical choice.

conceptually, to me, it is kind of analogous to storing an order "header" (which is just descriptive/meta data about the order) in config, and the line items - the actual "content" - in the database. there is a data relationship that is unknown and unenforceable to the database server. in the case of vocabularies and their related terms, that relationship may be considered somewhat looser (and with clearly less descriptive data than a typical order header), and perhaps with good reason.

drop's picture

there is a data relationship that is unknown and unenforceable to the database server

I wonder what you mean by this. The relationship between a term and it's vocabulary is the same in Backdrop as it was in Drupal - there is a column in the taxonomy_term_data table where this information is stored. In Drupal the column holding that information was initially the vid column (before the vocabulary column was added). In Backdrop we simply dropped the vid column.

How does it make a relationship unknown or unenforceable to remove a separate table that contains no information about the terms or term relationships? Or do we need to look at a different data example for that to be the case?

#askingforafriend

this is all just some relational database theory, really. i don't mean to be pressing any kind of issue.

logically speaking, the vid column was a foreign key, which was related to a primary key in the taxonomy_vocabulary table - which was moved out of the database. not that it was actually a database-enforced constraint, which of course was not even possible in mysql for many years. changing vid to vocabulary (and int to string) doesn't change the logical relationship. except now the entity (in relational db terms, not drupal/backdrop terms) containing the primary key is no longer in the database.

again, this is just an observation. no need to belabor the point. maybe over a beer or some coffee someday... :-)

 

edit: i somehow missed the last paragraph of your comment.

terms and their relationships to other terms, as well as their relationship to nodes, are modeled in the database. terms belong to a vocabulary; that implies a parent-child, one-to-many relationship. like an album and its songs. am i making any sense? i am no longer in the thick of the software/database development world, so maybe i am just showing my age. sybase was always my database server of choice (NOT microsoft's bastardization of it), so take it all with a grain of salt.

drop's picture

this is all just some relational database theory, really

Yes, and you seem to be quite knowledgable on the matter! I'm enjoying picking your brain here a bit, so thanks for taking the time to explain your perspective to me :)

the vid column was a foreign key, which was related to a primary key in the taxonomy_vocabulary table

In my mind "foreign key" just meant "key that's also in another table". If the other table is gone, so is the foreign-key ness of the value in the table with the data we care about. The taxonomy_vocabulary table didn't contain anything of value to the terms themselves, instead it held information about the type of term.

terms belong to a vocabulary; that implies a parent-child, one-to-many relationship.

Ah, I think this is maybe where we see things differently! To me, a vocabulary simply defines a type of taxonomy term -- it's not a parent. Architecturally it's the same as a node and a node type. I wouldn't call the "About" page a child of the "Page" content type. A "Page" is simply a name (and some configurations) for how nodes of that type (like "About") should behave.

Edit: I think maybe this was what you were getting at all along: perhaps it was the foreign-key-ness of the integer vid that "created" the parent-like relationship, regardless of how vocabularies and terms were used in the CMS.

The reason the integer vid was dropped for the string vocabulary is because that auto-increment IDs are problematic when deploying across environments (which gets back to these practical choices). Maybe what the CMS needed was a term-type, but it had been previously built with a term-parent instead?

 

yes, i understand your perspective on that. and mine is no doubt outdated in the world of web apps and "big data". i do tend to start from a somewhat "purist" perspective (eg, a logical data model), and back off from there as needed (eg, into a de-normalized physical data model).

i guess with my long ago consulting background, when i see this: "A vocabulary simply defines a type of taxonomy term" - i see the "defines" as cause for making it a relevant data element in the model. there is surely a gray area around what can be considered simply an attribute vs a key.

drop's picture

when i see this: "A vocabulary simply defines a type of taxonomy term" - i see the "defines" as cause for making it a relevant data element in the model.

Interesting. Do you see it the same way for nodes and node types?

if i was designing a database from scratch, yeah, i probably would have a node_type table. and for sure in the logical model i would. i could no doubt be convinced that for a given system, it might make sense to leave it out of the physical db structure and store it in a file. especially if it was a large system (in data size and/or transaction volume) where every millisecond counts and you have good file caching that substantially outperforms the database caching. in my former life, database performance was my obsession.

but i also heard what jen said in that post that was linked in the first comment. i have deep appreciation for configuration management. but i am not very familiar with what that really means in the current software (web) development world.

edit: quick clarification on the node_type example. if "node_type" was literally nothing more than a discriminator data element, i would of course not have a separate table for it. but if it had any other attributes associated with it - eg, descriptive attributes, searchability, default template, preprocess function, etc - then it would warrant a table in my mind.

i know i have a somewhat antiquated view of how the database fits into the overall system architecture, relative to modern web development. i don't assume it is correct.

a few years later, and i stumbled upon a use case where this question comes into play. 

tldr: taxonomy_term_save() allows insertion of a term with an invalid/non-existing vocabulary.

more info:

i have a module that creates a reference list of u.s. states as a vocabulary. while debugging (as i got my source data file tweaked), i had commented out the taxonomy_vocabulary_save() line, but i left the taxonomy_term_save() lines intact as my debug code stopped execution prior to hitting that line. but when i removed my debugging code, i forgot to uncomment the taxonomy_vocabulary_save() line. no errors when saving the terms. so now when i go to /admin/structure/taxonomy, there is no "states" taxonomy. there are terms in taxonomy_term_data for all the states in the database, but no way to access them from backdrop.

of course i could have (should have) checked for the existence of the vocabulary using taxonomy_vocabulary_load() before issuing the taxonomy_term_save() call. in fact my code does check for it early on, and only executes this code if the vocabulary does not already exist.

no question this is an outlier situation, for sure. but still it seems like a problem to allow the insertion of a term with an invalid vocabulary. THAT is precisely the kind of situation that suggests meaningful data be stored in the database, where referential integrity can be enforced by the engine. relying on code to enforce it is sketchy at best, apparently even for someone that has been coding for 40+ years :-)

jenlampton's picture

First, I do think it would be sensible to validate that a vocabulary exists as part of `taxonomy_term_save()`, I think that suggestion would merit a core issue (if you are up for creating one on GitHub?)

However, the location the vocabulary configuration data doesn't necessarily have anything to do with the solution to that problem.

In Backdrop (or even in Drupal, when the configuration was saved in the database) the database itself was never leveraged to do this kind of validation. Even with config in the database, we would be in exactly the same situation we are in now (but without all the benefits of having deployable configuration).

With a complex relational database like in Backdrop/Drupal the relationships are not defined by the database itself, but by the code that uses the data. The code will always be necessary to validate these relationships.

There are certainly other types of systems where the relationships can be defined and managed by the database itself, but unfortunately that's not what were working with here. 

clearly we have very different perspectives on data management, and how a database fits into complex systems. to say that in a complex relational database, relationships are not defined by the database... wow.

but i defer. i am no longer sufficiently engaged in software development to care to rock this boat.

jenlampton's picture

to say that in a complex relational database, relationships are not defined by the database... wow.

I'm not saying that wouldn't be a better approach, that's just not the world we live in :)