#StackBounty: #backup #cloud-storage #search #version-control Change-tracking searchable backup server with user accounts

Bounty: 50

Requirements:

I’m looking for software (and cloud platform) for an R&D startup company to perform periodic backups to a secure cloud. The backups should track changes (like Borg) to avoid data duplication and loss, and associate backed-up data with credible time-stamps. It (and/or the cloud platform) must support user accounts with read/write access to specific backed-up folders (which ideally have changes tracked independently). The seemingly most challenging requirement is that the platform must support a custom search for backed-up data, with user-authentication. E.g. the platform allows custom scripts to run, triggered externally, with an API for platform user authentication, and posting data back over the web.

Less critically but ideally, the cloud platform should only allow upload/download to a nominated PC and/or network. That is, prevent an employee from downloading data to their private computer.

Why:

The company performs experiments in a laboratory, using hardware connected to PCs, controlled by Jupyter notebooks. All info about a single experiment is stored in a single folder containing plaintext, images and database files, organised by QCoDeS. The folder will additionally contain a file with meta-info/searchable tags about the experiment.
These folders can be a few GB in size, and at most a few hundred of them are expected over the next few years. There are tens of employees/experimentalists.

The lab PCs are connected via a network file system, with each employee having their own private Windows user account. Each (e.g.) night, the root account on a nominated PC will automatically backup all of the day’s changes to the cloud. These could be new folders/experiments, or code/analysis/data additions to existing experiments.

Unless granted permission, and employee should only be able to see changes to their own experiments/folders in the cloud backup. They should be able to browse their backed-up experiments in e.g. the browser, and download the latest (or previous) versions of the experiment to the lab PCs. Furthermore, they should have some efficient way to search through their accessible experiments, e.g. via date performed, or other experiment tags, before download. Ideally, this search can be performed in python, directly from a Jupyter notebook (I’d code the interface).

The cloud platform should provide a usable way (e.g. in-browser) for an admin acc to register new users, or change their data access permissions. At worst, CLI to do this.

Flexibility:

The cloud/backup service will be maintained by technically literate people, who can write scripts (e.g. in python or bash, for Linux) to link services. For example, if the cloud platform is a server which can run custom code, then we can write the search utility and web API. However, we cannot securely manage user accounts and authentication ourselves. Hence ideally, the cloud platform would provide some API for authenticating users and querying what folders they have read access to, and securely POSTing it to the user.

Do any cloud services and backup systems offer anything with all of these facilities? Since this is a commercial application, the software need not be free nor open-source.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.