Database Design
This article applies to the original database design, which used the Sled key-value store.
The dicebot uses Sled. Sled is a fast, embedded key-value database that stores keys and values as byte arrays.
Why Sled?
- Mostly because I felt like it.
High-Level Architecture Overview
A simple overview of how the database code is organized.
Database Code
A struct Database
is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the Database
instance has a Variables
field and a Migrations
field. The Variables
type relates to user variables, and the Migrations
field manages the database migration version in the database.
Data Migrations
The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary).
Migrations are simply functions that receive the Database
instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts.
The dicebot currently has no support for automatically migrating Sled database file versions.
Guiding Principles
Sled essentially operates as a bunch of Map<[u8], [u8]>
s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines:
- Separate
Tree
s for different types of data. ATree
in Sled is an isolated keyspace. - Strongly typed access to data, converted into
[u8]
keys by code private to the DB layer. - Keys are UTF8 strings, sometimes separated by
0xfe
and0xff
delimiter bytes.0xfe
and0xff
are not valid UTF8, which makes them useful delimiters.
Trees, Keys, and Delimiters
Sled supports opening multiple Tree
instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table.
- In general, a single tree should store one kind of data.
- Ideally, the tree does not have more that one format used for keys.
Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the username
, room ID
, and the actual name of the variable. The delimiters 0xff
and 0xfe
are used for this sub-categorization.
- The
0xff
delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name). - The
0xfe
delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by0xfe
.
A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the scan_prefix
function allows finding all data that begins with that prefix.
Values
The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data.
- For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the
zerocopy
crate is preferred. - For complex structs with arbitrary sizes (strings, lists, etc), the
bincode
crate is preferred.
Where possible, higher-level Sled APIs should be used. Batches, compare_and_swap
and fetch_and_update
are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity.
Migrations
Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful:
- Isolate all functionality for the migration inside its function, or inside a module only the migration function can access.
- Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API.
- Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result.
Design: User Variables
This documents the database design of user-defined variables.
Database Schema
User variables are implemented with two trees:
room_user_variables
: Key format<username> 0xfe <room_id> 0xff <variable_name>
. Value type isi32
.room_user_variable_count
: Key format<username> 0xfe <room_id>
. Value type isi32
.
The room_user_variables
tree contains the actual variables and their value, defined by the user. The room_user_variable_count
tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally.
Design: Room State Management
This will document the room state management database design, once it's finished.
Database Schema
TODO.