Plugin Idea: Backup Snapshots

UX: A site admin provides to the plugin a directory path where backup snapshots can be written (writable by web server but not in the dataroot). The admin can then create, delete, or restore snapshots of the dataroot + DB.

Design: Snapshot files would be prefixed with yyyymmdd-dailycounter, and each snapshot would have separate files for SQL, data, and metadata.

Metadata ideally would contain: elgg version/release; list of active/all plugins; MD5 of mods folder (to detect modifications between snaps); and, if available, git data like commit SHA, branch, remote.

To get around memory limitations, this should rely on CLI tools where possible.

Use: This would, of course, aid in site backups, but also help developers work on multiple versions of Elgg, test upgrades, etc.

Previous art: ElggBackup (for 1.5)

  • Today I implemented a backup/restore feature for a Moodle module and this made me think if it could be possible to add similar features for Elgg.

    I didn't however think about making a backup of a whole Elgg site but of individual objects. It would be of course great to be able to do backups on many different levels (site, groups, users, objects, etc.) The same way as Moodle supports different levels (site, course, section, activity).

  • I agree with Juho Jaakkola.. it would be a better idea to backup and restore individual objects... options should be there in setting page to select with object you need to backup or restore... (+1 if there is option to backup restore 'ALL' entities)

     

  • i see that 'total' backup is more important than selective backup, since total backup is a required feature of running a website.. selective backup is a useful function for development and maintenance reasons.

    the primary function is total backup/restore.

    i haven't looked at the backup plugin for 1.5; i did just find this page, which gives a backup script for elgg that i didn't see previously. 

  • i made some changes to the elggbackup plugin to attempt to get it it run in 1.8..
    i don't fully know what needs to change..
    however, it got as far as the page where the backup cron is triggered and then produced a bad gateway message in the browser.

    i also am seeing:

    warning: deprecated in 1.8: settings/elggbackup/edit was deprecated in favor of plugins/elggbackup/settings called from [#8] /mysite/engine/lib/views.php:506'
     

     

  • i've altered the backup script that's in the elgg wiki to support emailing now.. its a script though, so not adapted to a plugin. it's working fine for me and is useful beyond just elgg, so i won't be evolving the backup plugin code.

  • "total backup is more important than selective backup, since total backup is a required feature of running a website"

    Full backup is of course more important but also much easier to do using existing tools (mysqldump etc).

    However every now and then I get a message from a client asking if I could restore a blog/group/etc that they have accidentally deleted. Restoring these isn't really possible without restoring the whole backup and therefore losing all changes made since the last backup was made.

  • This would be a huge time saver. Has anyone started this yet? 

    Regarding Juho's suggestion, we could do a base plugin, like a framework, and selective backups can branch out from there.

  • depends on the scenarios and situations. full-db backups can be done almost anywhere (else). i do my backups thru bash scripts via cron to backup selected sites/domains's databases from /var/lib/mysql directly - with no elgg or other apps' involvement at all;) but - 'selective backups' that juho mentions -- will need some smarter, selective, logical backups independent of data structured via guid#s to facilitate selective restores.. that's the harderr part to code. it;s almost like needing an 'undo' feature ;) by the entity; maybe some form of logging and rollback mechanism, but that requires a heavier understanding of techniques for formal 'journalling' for backup/restore.

  • Here's a simple example how Moodle restore system works. It maps object's old id to the new one so the parent object doesn't even have to exist in order to restore both it and it's child objects.

    Let's say we have backup of a "book" 123 and inside it "pages" 1 and 2.

    1. When restoring the book, Moodle does something like this:

    // $book is the backup of the book
    $old_id = $book->id; // The id of the book in the backup (123)
    // Create a new row to database table called "books"
    // The new row gets the id 234
    $new_id = $DB->insert_record('books', $book);
    // Map the old book id to the new one (123 <-> 234)
    $this->set_mapping('book', $old_id, $new_id);

    2. Now as we add the pages one by one we can find out which book they belong to:

    // $page is the backup of a single page
    // Get the id of the book row that has already been restored (234)
    // Here $page->book_id is the old id (123)
    $page->book_id = $this->get_mappingid('book', $page->book_id);
    // Create a new row to database table called "book_pages"
    $page_id = $DB->insert_record('book_pages', $page);

  • The actual backups are XML files like this:

    <book id=123>
        <page id=1>Lorem ipsum</page>
        <page id=2>Dolor sit amer</page>
    </book>