Hello,

A recent close reading of the details of the Drupal 7 EOL announcement revealed that part of the EOL process will be to spin down Drupal 7 related resources, including the potential removal  of Drupal 7 related messaging and source code files from drupal.org

Considering that Backdrop CMS is a "port" of Drupal 7, I think it would be prudent for Backdrop CMS to take "snapshots" of not only the /core branch of Drupal 7, but also its /contrib branch, which contains thousands of (as-yet) unported solutions.

Forgive my ignorance if Drupal 7 is already being entirely "mirrored" by Backdrop CMS.

  • If this is already being done, where is it?
  • If it is not being done, why not?

The equity of "zillions" of man-hours is at stake.

Without its own "mirror", the situation with Backdrop CMS could easily take a nasty turn:

  • Like with WordPress
    • Prejudicial blocking of access to code
  • Like with the Wayback Machine
    • Intermittently offline due to hacking activities
  • Like with 23andMe
    • DNA information falling into unknown hands resulting from bankruptcy

None of the above situations are  reassuring.  So, how is the Backdrop CMS project guaranteeing to itself eternal access to Drupal 7 source code?

RESEARCH

I did a little research regarding the magnitude of the problem:

METHOD

  • I downloaded a list of all files available form drupal.org, which included their size in bytes.
  • I copy/paste the entire contents into notepad
  • I copy/past the notepad into MS-EXEL, which helped separate the columns
  • I used MS-EXCEL to generate preliminary SQL statements
  • I used GVIM to finalize the SQL statements (removing \t mostly)
  • I uploaded the list to my Linux machine
  • I created SQL scripts to create a local MariaDB database and table
  • I uploaded the prepared SQL statements into the database 
  • I performed various SQL queries on the database

FINDINGS

Count:
I used SQL to discover the number of all Drupal 7.x files:

> select count(file_name) from drupalfiles where file_name LIKE "%7.x%";
+------------------+
| count(file_name) |
+------------------+
|           164867 |
+------------------+
1 row in set (0.338 sec)

According to my research, there are 164,867 individual Drupal 7.x files

Size:
I used SQL to discover the footprint of all Drupal 7.x files:

> select sum(file_size_bytes) from drupalfiles where file_name LIKE "%7.x%";
+----------------------+
| sum(file_size_bytes) |
+----------------------+
|         339582768746 |
+----------------------+
1 row in set (0.370 sec)

According to my math, that makes the Drupal 7.x footprint:

  • 0.34Tb @ 1,000,000,000,000
  • 339.58Gb @ 1,000,000,000
  • 339,582.77Mb @ 1,000,000
  • 339,582,768,746 bytes

CONCLUSION(S)

Based on my findings, I am now developing a script that will mirror those Drupal 7.x files.  

I plan to get a Drupal 7.x mirror up and running ASAP.

g.
----

Most helpful answers

After today's discussion at Backdrop LIVE, I looked around to see the state of Drupal.org's move to Drupal 10. I didn't locate the Drupal 10 code base, but I did run across the Drupal 10 Migration Project, which has completed migrations for both Documentation Pages and the Forum (which I thought would be likely dropped in the D10 migration). However Book pages have yet to be migrated. This project seems to be very active with changes in the past few days. Perhaps a mad rush to try to get Drupal.org itself updated before the EOL date.

I expressed during the discussion that I don't think it's likely that the code (as in the Git history) for old Drupal versions is likely to be removed. Because the history of the code is the history of the contributors, and thus the copyright holders of the code. The GPL is built on copyright law and so maintaining that history is important from a legal perspective. It's still possible abandoned projects might get removed, but any project that was ported from D7 will likely maintain that history.

 

There are some concerning statements in the End of life announcement, see the following (amongst others) in section "What the Drupal 7 End of Life means for you":

  • Drupal.org will no longer support tasks related to Drupal 7 including documentation navigation, automated testing, packaging, etc.
  • Drupal.org file archive packaging (tar and zip files) for Drupal 7 will be shut off.
    The archives may be removed.
  • Package tarballs may no longer be downloadable.

The announcement doesn't speak of complete source code removal, but I wouldn't bet on it. I agree it would be good to have a copy of the Git repositories.

Comments

@Graham 

I believe that this has been discussed, but I don't think anyone has done any serious exporation or work on it. 

Personally, I'm hopeful that Drupal 7 repos will remain available for a while (several years, at least), but I understand why this might be a dangerous assumption. 

I appreciate that your not just suggesting the idea, but putting forward a plan and working on it. 

I think it would be great if you could attend one of our weekly development meetings to discuss this OR schedule a discussion at the upcoming Backdrop LIVE. If your willing to host a discussion on this topic at Backdrop LIVE, let me know and I can help schedule it.

https://events.backdropcms.org/

If you are able to create this mirror, I think it would be good to figure out how to maintain this and make sure it's available for the community. 

Hello @stpaultim,

Thank you for your kind words.  I am always up for joining whatever you suggest (you always have great ideas).   The issue has always been that I am in GMT +8, so a lot of the BCMS gatherings and discussions in the past were scheduled in the middle of the night for me.

The footprint of the entire codebase is approaching 350Gb, which is big enough to deal with fairly gingerly.  Now, a lot of those files represent the evolution of the package in question, meaning all versions.  I believe that there is some merit to looking into what the footprint of just the latest version(s) would be, it's probably much, much smaller.

Lastly, my concept of a "Mirror" may have been phrased too loosely.

Currently, my plan is to gradually snapshot a copy of the files to my local file server.  I have no plans to offer them publicly, which would incur "forever" style bandwidth and hosting fees and management costs, which I may not always be in a postion to provide.  

Currently, I am taking advantage of some rare inter-project downtime, which is resulting in a flurry of BCMS-oriented activity.  I am sure those of you who know me over the years have taken note of my long absences, then "bursts" of activity when it comes to BCMS :)

I can try to talk to  a local provider here in Hong Kong to see if they will replicate the D7 codebase.  They are already mirroring a bunch of other technologies (Ubuntu, etc...) and are therefore already already open to the idea of doing this as a public service, and already configured to support an effort like that.  To them (unlike me) it would be a "plus one" rather than a "zero to one" style challenge.

I will keep you posted.

g.
----

I wasn't aware that the repos were going and I can't find any particular mention of them disappearing.  However, I think you are right that we shouldn't make assumptions given how many modules there might be that we would want a port of in the future.

Thanks for all your thoughts and research.

It looks like from here there are 14,648 modules that work with D7, of which we probably currently have ported about 1/14.

Olafski's picture

There are some concerning statements in the End of life announcement, see the following (amongst others) in section "What the Drupal 7 End of Life means for you":

  • Drupal.org will no longer support tasks related to Drupal 7 including documentation navigation, automated testing, packaging, etc.
  • Drupal.org file archive packaging (tar and zip files) for Drupal 7 will be shut off.
    The archives may be removed.
  • Package tarballs may no longer be downloadable.

The announcement doesn't speak of complete source code removal, but I wouldn't bet on it. I agree it would be good to have a copy of the Git repositories.

Hello @all,

Maybe I was being paranoid, can you also read this and tell me what you think?

https://www.drupal.org/psa-2023-06-07

"What the Drupal 7 End of Life means for you

Once Drupal 7 reaches End of Life, this means:

  1. The Drupal Security Team will no longer provide support or Security Advisories for Drupal 7 core, contributed modules and themes.
  2. Security issues for Drupal 7 may be disclosed in public, and zero-days (i.e, security vulnerabilities being exploited in the wild without advance warning) may occur.
  3. Drupal.org will no longer support tasks related to Drupal 7 including documentation navigation, automated testing, packaging, etc.
  4. All Drupal 7-compatible releases on project pages will be flagged as not supported.
  5. Some Drush functionality for Drupal 7 will stop working as the underlying Drupal.org infrastructure will be removed.
  6. Drupal.org file archive packaging (tar and zip files) for Drupal 7 will be shut off.
    The archives may be removed.
  7. There will be no more core commits on Drupal core 7.x.
  8. Package tarballs may no longer be downloadable.
  9. External vulnerability scans will flag Drupal 7 as insecure.

If you are still maintaining a Drupal 7 site, we recommend migrating to Drupal 10 before the end of life date."

I think it is saying that the packages (i.e. when a release is done) will not be available along with any mechanisms for checking for updates and finding supported projects. So, perhaps you are reading more into it than is there.

I am, however, persuaded that given what else has happened in the world of open source and that Drupal doesn't always think about Backdrop, you are right that it would be a good idea to maintain a mirror of the modules.  I don't think there is much point in doing the same for themes, though others may disagree.

What I also think we should include is all the documentation as there are still loads of references to Drupal documentation that may well disappear.  Someone has talked about scraping the whole lot which would be great, though we should look to move what we can into our documentation infrastructure as we are able.

@yorkshirepudding,

Agreed.  I may be over-thinking it

...But the CentOS and WP Engine situation(s) have me thinking nobody can be trusted forever, and even those bets are off if new owners show up to play.

The sooner BCMS stops being "information incubated" by Drupal the better.  

According to Drupal's own PR department, life support for D7 is being pulled for sure in 73 days.

Don't forget translations too.

  • Developer Documentation:  API
  • Administrator Documentation
  • User Documentation
  • Source Code (/core, /contrib)
  • Translations

The list is growing...

Please add anything I may have overlooked so we can develop a checklist.

g.
----

I agree that in theory, these repos should be available for a long time. BUT, I also agree that we should not risk it. It would be a huge loss to our community if these repos disappeared.

I appreciate that @graham is trying to back them up, but is not doing so publicly. We should discuss how we want to handle this as a community. 

This may have come up before, but I don't recall for sure and am not aware of any decisions about what to do. 

I've suggested this as a possible topic at today's dev meeting. https://forum.backdropcms.org/forum/oct-24th-weekly-meetings

Hello Everyone,

In my paranoia I just took a look at Drupal's documentation area.  

I noticed (ominously) that while there is extant documentation for Drupal 7, there is no such thing for any prior versions of Drupal...

Drupal 7:

Drupal 6:

 

 

 Drupal 8 (redirects to generic page)

Drupal 9:

Drupal 10:

 

Update

The "exceptions" list  (which I cal the "0.x" list)  is composed of:

7036 files

And took:

~30.5 hours (1803 minutes)

To download with a 1 to 30 second random delay injected to prevent ip banning

Average download time:

15 seconds

Looks like my random interval is the dominant timing variable :)
 

Update:

Some statistics regarding how I plan to build my own personal Drupal codebase mirror:

Directory Structure:

Drupal/
├── 0.x
├── 10.x
├── 11.x
├── 12.x
├── 13.x
├── 14.x
├── 1.x
├── 2.x
├── 3.x
├── 4.x
├── 5.x
├── 6.x
├── 7.x
├── 8.x
├── 9.x
├── Archive
├── get-drupal-files.py
└── README.txt

Some statistics regarding branch 0.x (the smallest)

Files:  7,036

Size:  1.9Gb

Seeing as it took 30+ hours to download ~2Gb with an average 15 second delay between files, I can safely say that the rate is ~15 hours per Gb with a randomized 1 to 30 second inter-file pause.

If memory serves correctly, the 7.x branch is ~340Gb in size:

339.58Gb

Inter-file pause 1...30 seconds:  231 days

Inter-file pause to 1...15 seconds:  115 days

Inter-file pause 1...10 seconds: 76 days

76 days from today is 30JAN2025

g.

----

 

quicksketch's picture

After today's discussion at Backdrop LIVE, I looked around to see the state of Drupal.org's move to Drupal 10. I didn't locate the Drupal 10 code base, but I did run across the Drupal 10 Migration Project, which has completed migrations for both Documentation Pages and the Forum (which I thought would be likely dropped in the D10 migration). However Book pages have yet to be migrated. This project seems to be very active with changes in the past few days. Perhaps a mad rush to try to get Drupal.org itself updated before the EOL date.

I expressed during the discussion that I don't think it's likely that the code (as in the Git history) for old Drupal versions is likely to be removed. Because the history of the code is the history of the contributors, and thus the copyright holders of the code. The GPL is built on copyright law and so maintaining that history is important from a legal perspective. It's still possible abandoned projects might get removed, but any project that was ported from D7 will likely maintain that history.

 

I came across this forum topic and, as I was reading through it, recalled reading something a few years ago about someone backing up Yahoo Answers when they got shut down. I searched and found the article:

https://www.quantcdn.io/blog/articles/we-archived-yahoo-answers

Looking at that website, it seems they offer free subscriptions for open-source projects (https://www.quantcdn.io/free-open-source), so perhaps that's worth looking into...