404 lines
16 KiB
Markdown
404 lines
16 KiB
Markdown
# WissKI-Distillery
|
|
|
|
WissKI-Distillery is a Docker-based server provisioning and managing for multiple
|
|
[WissKI](https://wiss-ki.eu/) instances.
|
|
|
|
The WissKI Distillery is a set of scripts, tools, and applications that allows to operate
|
|
a WissKI cloud of distinct but jointly managed WissKI instances, hosted on a dedicated
|
|
hardware pool. Like the WissKI system, the WissKI Distillery is open source and free to
|
|
use.
|
|
|
|
This README contains only technical documentation.
|
|
For members of [FAU Erlangen-Nürnberg](https://www.fau.de/) a cloud offering based on this service known as FAUWissKICloud.
|
|
Please see https://wisski.data.fau.de/ for related documentation.
|
|
|
|
## Overview
|
|
|
|
This project consists of the following:
|
|
|
|
- this README
|
|
- a [Go](https://go.dev/) command `wdcli`
|
|
- a `Vagrantfile` for local testing
|
|
|
|
The go command is almost dependency free.
|
|
It expects that a basic debian system (in particular the `apt-get` command) is available.
|
|
The command has been tested only under Debian 10, but may also work under older or newer versions.
|
|
The command expects to be run as root, and will fail when this is not the case.
|
|
Each subcommand comes with documentation, which can be found in this readme, as well as via the command line when passing a `--help` flag.
|
|
|
|
To bootstrap a new distillery instance, the `wdcli bootstrap` command can be used.
|
|
First copy the executable onto the server, using a command similar as:
|
|
|
|
```bash
|
|
GOOS=linux GOARCH=amd64 go build -o wdcli ./cmd/wdcli && scp ./wdcli distillery.example.com:
|
|
```
|
|
|
|
Next, access the server and run the `bootstrap` command:
|
|
|
|
```
|
|
$ ssh distillery.example.com
|
|
user@distillery.example.com$ sudo ./wdcli bootstrap
|
|
```
|
|
|
|
This will create a deployment directory (`/var/www/deploy` by default).
|
|
Next, edit the configuration file `/var/www/deploy/.env` and customize it to your liking.
|
|
Usually it only requires adjustment in very few places.
|
|
|
|
Next, download a [GraphDB](https://graphdb.ontotext.com/) zip file, and bring the distillery online using:
|
|
|
|
```bash
|
|
/var/www/deploy/wdcli system_update /path/to/graphdb.zip
|
|
```
|
|
|
|
## Vagrantfile
|
|
|
|
For local testing, it is recommended to use [Vagrant](https://www.vagrantup.com/) and the provided `Vagrantfile`.
|
|
After installing vagrant, run:
|
|
|
|
```bash
|
|
# once, to install the plugin to automatically build the guest iso
|
|
# at the time of writing version 0.25.0 is broken.
|
|
vagrant plugin install --plugin-version 0.24.0 vagrant-vbguest
|
|
|
|
# start the vargant box
|
|
vagrant up
|
|
|
|
# open a shell inside the vm
|
|
# for debugging purposes forward port 7200 (GraphDB) and 8080 (phpmyadmin)
|
|
vagrant ssh -- -L 7200:127.0.0.1:7200 -L 8080:127.0.0.1:8080
|
|
```
|
|
|
|
## System Updates
|
|
|
|
_TLDR: `sudo /var/www/deploy/wdcli system_update /path/to/graphdb.zip`_
|
|
|
|
To run a WissKI Distillery, several core Docker Instances must be installed.
|
|
These are:
|
|
|
|
- [nginx-proxy](https://github.com/nginx-proxy/nginx-proxy) - an automated nginx reverse proxy
|
|
|
|
- This will delegate individual hostnames to appropriate docker containers, see [this blog post](http://jasonwilder.com/blog/2014/03/25/automated-nginx-reverse-proxy-for-docker/) for an overview.
|
|
- Optionally makes use of [docker-letsencrypt-nginx-proxy-companion](https://github.com/nginx-proxy/docker-letsencrypt-nginx-proxy-companion) to automatically provision and renew HTTPS certificates.
|
|
- See [internal/component/web](internal/component/web) for implementation details.
|
|
|
|
- [MariaDB](https://mariadb.org/) - an SQL server
|
|
|
|
- It is configured to run inside a docker container
|
|
- A passwordless `root` account is created, which can only be used from inside the container.
|
|
- An additional admin account (as defined per config file) is created, which is used for administration.
|
|
- A secondary management account is also created. This is configured via the distillery configuration file, and can be access from anywhere.
|
|
- A `bookkeeping` database and table is created by default, to store known WissKI instance metadata in.
|
|
- It is accsssible using `127.0.0.1:3306`
|
|
- A database shell can be opened using `sudo /var/www/deploy/wdcli mysql`.
|
|
- A [phpmyadmin](https://www.phpmyadmin.net/) is started on `127.0.0.1:8080`.
|
|
- See [internal/component/sql](internal/component/sql) for implementation details.
|
|
|
|
- [GraphDB](http://graphdb.ontotext.com/) - a SPARQL backend for WissKI (Version 10.0 or later)
|
|
- It is configured to run inside a docker container.
|
|
- The Workbench API is started on `127.0.0.1:7200`.
|
|
- Security is not enabled at the moment.
|
|
- See [internal/component/triplestore](internal/component/triplestore) for implementation details.
|
|
|
|
- [proxyssh](https://github.com/tkw1536/proxyssh) - an ssh server that delegates client connections to different WissKIs
|
|
- It is configured to run inside a docker container.
|
|
- Uses a global configurable authorized_keys file.
|
|
- Also allows users to write their own authorized_keys files.
|
|
- See [distillery/resources/compose/ssh](embed/resources/compose/ssh) for implementation details.
|
|
|
|
- [wdresolve](https://github.com/FAU-CDI/wdresolve) - a global WissKI Distillery Resolver
|
|
- It is configured to run inside a docker container
|
|
- Uses configuration which is updated with `sudo /var/www/deploy/wdcli update_prefix_config`
|
|
- Running in the browser under the `/go/` path of the main domain.
|
|
- See [distillery/resources/compose/resolver](embed/resources/compose/resolver) for implementation details.
|
|
|
|
- `dis` - a WissKI Distillery Information Server
|
|
- It is configured to run inside a docker container
|
|
- Running in the browser under the `/dis/` path of the main domain.
|
|
- See [distillery/resources/compose/resolver](embed/resources/compose/dis) for implementation details.
|
|
|
|
To manage multiple docker containers, this script makes heavy use of [docker compose](https://docs.docker.com/compose/).
|
|
|
|
Setting up these steps is fully automatic.
|
|
In particular, after obtaining a license and the installation zip file for 'GraphDB' one can just run:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli system_update /path/to/graphdb.zip
|
|
```
|
|
|
|
In principle this script is idempotent, meaning it can be run multiple times achieving the same effect.
|
|
|
|
## Provisioning a new WissKI instance -- 'wdcli provision'
|
|
|
|
_TLDR: `sudo /var/www/deploy/wdcli provision name-of-website`_
|
|
|
|
A new WissKI instance consists of several components:
|
|
|
|
- A Drupal instance inside a lightweight php runtime container (a `barrel` in which to store WissKI)
|
|
- An entry in the SQL bookkeeping table that stores instance meta-data
|
|
- An SQL database and user for Drupal
|
|
- A GraphDB repository and user as SPARQL endpoint
|
|
|
|
Each WissKI instance is identified by a ``slug''.
|
|
This is a preferably short name that is used to form a domain name for the WissKI instance.
|
|
The WissKI distillery assumes that each instance is a subdomain of a given domain.
|
|
For example, if the given domain is 'wisskis.example.com' and the slug of a particular instance is 'blue', the subdomain used by this instance would be 'blue.wisskis.example.com'.
|
|
The given domain can be configured within the '.env' file.
|
|
|
|
We use the following process to provision a new instance:
|
|
|
|
**1. Create a new docker-compose.yml file**
|
|
|
|
In this step we first create a directory on the real system to hold all files relating to this instance.
|
|
By default, this takes place inside `/var/www/deploy/instances/$DOMAIN`, but this can be configured.
|
|
We then create a docker-compose file in this directory that is ready for running the `barrel` container.
|
|
|
|
**2. Create an appropriate SQL database and user**
|
|
|
|
We create a new SQL database to eventually store Drupal-related data in.
|
|
The user and database names are generated from the slug.
|
|
The database password is randomly generated and only made available directly to the Drupal instance later.
|
|
|
|
**3. Create a GraphDB repository and user**
|
|
|
|
Next, we create a dedicated GraphDB repository for the WissKI instance.
|
|
We also create a new GraphDB user with access to this repository.
|
|
|
|
**4. Provision the instance inside the container**
|
|
|
|
We start the container in provisioning mode.
|
|
|
|
This does the following:
|
|
|
|
- Creates a new composer project that requires [drupal/recommended-project](https://github.com/drupal/recommended-project)`.
|
|
- Installs `drush` into this project.
|
|
- Runs the `drush site-install` command to configure the Drupal instance. Generates a random password to use.
|
|
- Adds and enables WissKI-specific modules for this instance.
|
|
- Sets up a WissKI Salz Adapter to use the GraphDB Repository.
|
|
|
|
**6. Start the Docker Container**
|
|
|
|
Finally, we can start the docker container.
|
|
|
|
These steps can be performed automatically.
|
|
To do so, use:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli provision SLUG
|
|
```
|
|
|
|
## Rebuild an instance -- 'wdcli rebuild'
|
|
|
|
Sometimes it becomes necessary (because of changes to this project) to rebuild the docker image running a certain docker instance.
|
|
To do so, use:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli rebuild SLUG
|
|
```
|
|
|
|
Note that rebuilding an instance does restart the docker container resulting in a small (typical < 1 second) interruption to the website in question.
|
|
Furthermore, while the container recreated, the old image stays on the host.
|
|
To delete all instances, run:
|
|
|
|
```bash
|
|
sudo docker image prune --all
|
|
```
|
|
|
|
To automatically rebuild all instances, use the rebuild command without any arguments:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli rebuild
|
|
```
|
|
|
|
## Reserving an instance -- 'wdcli reserve'
|
|
|
|
Sometimes it is useful to reserve a particular instance name.
|
|
This is done by hosting a placeholder website at the domain.
|
|
To do so, use:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli reserve SLUG
|
|
```
|
|
|
|
To un-reserve a website, manually stop the docker stack and remove the folder.
|
|
|
|
## Purge an existing WissKI instance -- 'wdcli purge'
|
|
|
|
Sometimes it is required to remove a given WissKI instance.
|
|
In particular all parts belonging to it should be removed.
|
|
|
|
To use it, run:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli purge SLUG
|
|
```
|
|
|
|
This cannot be undone (expect for manually re-installing a backup or snapshot).
|
|
Therefore it typically requires explicit confirmation.
|
|
|
|
## Open a shell -- 'wdcli shell'
|
|
|
|
Sometimes manual changes to a given WissKI instance are required.
|
|
For this purpose, you can use:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli shell SLUG
|
|
```
|
|
|
|
This will open a shell in the provided WissKI instance.
|
|
|
|
## List all instances -- 'wdcli ls'
|
|
|
|
To list all instances, the following command can be used:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli ls
|
|
```
|
|
|
|
## Backups & Snapshots -- 'wdcli backup' and 'wdcli snapshot'
|
|
|
|
### Backup the entire Distillery
|
|
|
|
This project comes with a backup script.
|
|
To make a backup of *all instances*, run:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli backup
|
|
```
|
|
|
|
Backups may temporarily shutdown individual instances to ensure data consistency.
|
|
Typical backup times are a minute or less.
|
|
|
|
Backups are stored in the `/var/www/deploy/snapshots/archives` directory.
|
|
They contain:
|
|
|
|
- a snapshot of every single instance (see below)
|
|
- a complete backup of the SQL database
|
|
- nquads of all the GraphDB repositories
|
|
- a backup of the configuration + data file(s)
|
|
|
|
Files are `.tar.gz`ipped.
|
|
|
|
By default, backups are kept for up to thirty days, after which they are removed.
|
|
This can be configured in the WissKI Distillery Configuration File.
|
|
|
|
### Snapshot a single instance
|
|
|
|
To snapshot a single instance, you can `sudo /var/www/deploy/wdcli snapshot SLUG`.
|
|
It takes either 1 or 2 arguments:
|
|
|
|
```bash
|
|
# snapshot a single instance and pick a new file in /snapshots/archives
|
|
sudo /var/www/deploy/wdcli snapshot SLUG
|
|
|
|
# backup a single instance into a specific file
|
|
sudo /var/www/deploy/wdcli snapshot SLUG /path/to/snapshot.tar.gz
|
|
```
|
|
|
|
The snapshot proceeeds as follows:
|
|
1. make a copy of the instance configuration
|
|
2. shutdown the running instance
|
|
3. make a dump of the triplestore and mysql databases
|
|
4. make a copy of the file system
|
|
5. export all pathbuilders
|
|
6. start the instance again
|
|
7. package the data into the final `.tar.gz` file
|
|
|
|
When uptime is critical, it is possible to skip shutting down a running instance.
|
|
This might result in inconsistent backup data.
|
|
To do so, run the script with the `--keepalive` flag:
|
|
|
|
```bash
|
|
sudo /var/www/deploy/wdcli snapshot SLUG --keepalive
|
|
```
|
|
|
|
## SSH Access
|
|
|
|
The distillery exposes an ssh daemon for users to access individual WissKI Shells.
|
|
It is running on port 2222 by default.
|
|
|
|
To access a shell in a particular barrel set the username equal to the slug.
|
|
For instance, to gain access to a shell inside a WissKI instance with a slug `porcelain` use the following command line:
|
|
|
|
```bash
|
|
ssh -p 2222 porcelain@localhost
|
|
```
|
|
|
|
Replace `localhost` with the hostname of the WissKI Distillery.
|
|
|
|
Inside the container, normal shell acess is provided.
|
|
Both `drush` and `composer` are available.
|
|
No technical reasons using `sudo` or switching to `root` is not possible.
|
|
|
|
### Authentication
|
|
|
|
Authentication is performed using SSH Keys.
|
|
Within each instance, ssh keys can be added to the file `/var/www/.ssh/authorized_keys` using the default OpenSSH `authorized_keys` format.
|
|
|
|
Furthermore, global ssh Keys (that have access to every instance) can be added to a `GLOBAL_AUTHORIZED_KEYS_FILE`. This is set in the Distillery `.env` file, and defaults to `/distillery/authorized_keys/`.
|
|
|
|
### Port Forwarding
|
|
|
|
In order to access the __GraphDB Workbench__ or __phpmyadmin__ ssh port forwarding can be used.
|
|
GraphDB is running on the host `triplestore` on port `7200`.
|
|
PhpMyAdmin is running on the host `phpmyadmin` on port `8080`.
|
|
|
|
To forward both you can use a command such as:
|
|
|
|
```bash
|
|
ssh -p 2222 -L localhost:7200:triplestore:7200 -L localhost:8080:phpmyadmin:8080 porcelain@localhost
|
|
```
|
|
|
|
This will make GraphDB and PhpMyAdmin available at `localhost:7200` and `localhost:8080` for the duration of the connection.
|
|
|
|
### Resolver
|
|
|
|
In order to resolve WissKI URIs globally, we make use of [wdresolve](https://github.com/FAU-CDI/wdresolve).
|
|
This can be queried with a single URI, and will be redirected to the page of the corresponding WissKI Entity.
|
|
This is deployed under `/go/` path of the top-level domain.
|
|
|
|
For example, if the domain name of the distillery instance is `wisski.example.com`, then the resolver would respond to queries like `https://wisski.example.com/go/?uri=https://first.wisski.example.com/content/123`.
|
|
The resolver configuration is automatically updated by the `update_prefix_config.sh` script.
|
|
It should not be neccessary to reload this configuration manually, as it is automatically called during `system_update.sh`.
|
|
|
|
It is also possible to manually add a URI prefix to an instance.
|
|
For this purpose, add a file named `prefixes` to the base directory of the instance, with one prefix per line.
|
|
|
|
Furthermore, you can also exclude a specific instance from URL prefix resolving.
|
|
This should be the case for cloned or backup instances.
|
|
For this purpose, add a file named `prefixes.skip` to the base directory of the instance.
|
|
This will casuse the instance to be skipped entirely.
|
|
|
|
## License
|
|
|
|
This project and associated files in this repository are licensed as follows:
|
|
|
|
WissKI-Distillery - A docker-based WissKI instance server
|
|
Copyright (C) 2020-22 CDI <https://www.cdi.fau.de/>
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
it under the terms of the GNU Affero General Public License as
|
|
published by the Free Software Foundation, either version 3 of the
|
|
License, or (at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU Affero General Public License for more details.
|
|
|
|
You should have received a copy of the GNU Affero General Public License
|
|
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
|
|
Please see `LICENSE` for a legally binding license text.
|
|
The short summary of the license is:
|
|
|
|
- You may use this software for any purpose, including commerical.
|
|
- You may create derivative works, and use those for any purpose, including commerical.
|
|
|
|
if you follow the following conditions:
|
|
|
|
- You provide the end-user with a copy of this license.
|
|
- You make the source code of any derivative works available.
|
|
- Any derivative works clearly list changes made.
|
|
- You license any derivative works under the same license.
|
|
|
|
This also applies if you only run a backend service based on this software.
|