Database Design

This article applies to the original database design, which used the Sled key-value store.

The dicebot uses Sled. Sled is a fast, embedded key-value database that stores keys and values as byte arrays.

Why Sled?

Mostly because I felt like it.

High-Level Architecture Overview

A simple overview of how the database code is organized.

Database Code

A struct Database is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the Database instance has a Variables field and a Migrations field. The Variables type relates to user variables, and the Migrations field manages the database migration version in the database.

Data Migrations

The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary).

Migrations are simply functions that receive the Database instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts.

The dicebot currently has no support for automatically migrating Sled database file versions.

Guiding Principles

Sled essentially operates as a bunch of Map<[u8], [u8]>s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines:

Separate Trees for different types of data. A Tree in Sled is an isolated keyspace.
Strongly typed access to data, converted into [u8] keys by code private to the DB layer.
Keys are UTF8 strings, sometimes separated by 0xfe and 0xff delimiter bytes. 0xfe and 0xff are not valid UTF8, which makes them useful delimiters.

Trees, Keys, and Delimiters

Sled supports opening multiple Tree instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table.

In general, a single tree should store one kind of data.
Ideally, the tree does not have more that one format used for keys.

Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the username, room ID, and the actual name of the variable. The delimiters 0xff and 0xfe are used for this sub-categorization.

The 0xff delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name).
The 0xfe delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by 0xfe.

A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the scan_prefix function allows finding all data that begins with that prefix.

Values

The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data.

For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the zerocopy crate is preferred.
For complex structs with arbitrary sizes (strings, lists, etc), the bincode crate is preferred.

Where possible, higher-level Sled APIs should be used. Batches, compare_and_swap and fetch_and_update are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity.

Migrations

Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful:

Isolate all functionality for the migration inside its function, or inside a module only the migration function can access.
Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API.
Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result.

Design: User Variables

This documents the database design of user-defined variables.

Database Schema

User variables are implemented with two trees:

room_user_variables: Key format <username> 0xfe <room_id> 0xff <variable_name>. Value type is i32.
room_user_variable_count: Key format <username> 0xfe <room_id>. Value type is i32.

The room_user_variables tree contains the actual variables and their value, defined by the user. The room_user_variable_count tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally.

Design: Room State Management

This will document the room state management database design, once it's finished.

Database Schema

TODO.