Note: I'm posting this bit in the Backdrop Forum as I was requested to do so; this is part of a larger post that presents some history ad outlines motivations. If you are interested in the full version, please find it here instead: https://gwolf.org/2024/12/some-tips-for-those-who-still-administer-drupa...

Given I have several sites to migrate, and that I'm trying to get my colleagues to follow suite, I decided to automatize the migration by writing an Ansible playbook to do the heavy lifting. Of course, the playbook's users will probably need to tweak it a bit to their personal needs. I'm also far from an Ansible expert, so I'm sure there is ample room fo improvement in my style.

But it works. Quite well, I must add.

However... I did stumble across a big pebble, though. I am working on the migration of one of my users' sites, and found that its database is... huge. I checked the mysqldump output, and it got me close to 3GB of data. And given the D2B_migrate is meant to work via a Web interface (my playbook works around it by using a client I wrote with Perl's WWW::Mechanize), I repeatedly stumbled with PHP's maximum POST size, maximum upload size, maximum memory size...

I asked for help in Backdrop's Zulip chat site, and my attention was taken off fixing PHP to something more obvious: Why is the database so large? So I took a quick look at the database (or rather: my first look was at the database server's filesystem usage). MariaDB stores each table as a separate file on disk, so I looked for the nine largest tables:

# ls -lhS|head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec  2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec  6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql  92M Dec  9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql  80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql  72M Dec  2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql  68M Dec  2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql  60M Dec  9 13:19 cache_menu.ibd

A single table, the access log, is over 2.4GB long. The three following tables are, cache tables. I can perfectly live without their data in our new site! But I don't want to touch the slightest bit of this site until I'm satisfied with the migration process, so I found a way to exclude those tables in a non-destructive way: given D2B_migrate works with a mysqldump output, and given that mysqldump locks each table before starting to modify it and unlocks it after its job is done, I can just do the following:

$ perl -e '$output = 1; while (<>) { $output=0 if /^LOCK TABLES `(accesslog|search_index|watchdog|cache_field|cache_path)`/; $output=1 if /^UNLOCK TABLES/; print if $output}' < /tmp/d7_backup.sql  > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec  6 18:14 /tmp/d7_backup.sql

Five seconds later, I'm done! The database is now a tenth of its size, and D2B_migrate is happy to take it. And I'm a big step closer to finishing my reliance on (this bit of) legacy code for my highly-visible sites 😃

Comments

Since before migration you need to prepare the Drupal site - disabling and removing all modules, disabling themes, as part of the preparation you can clear the cache and logs, delete the search index, which will make your work easier.