Update 'Database Design'

projectmoon 2020-11-07 14:44:02 +00:00
parent e3bc335063
commit c996eef08f
1 changed files with 82 additions and 74 deletions

@ -1,74 +1,82 @@
# Database Design # Database Design
The dicebot uses [Sled](https://sled.rs/). Sled is a fast, embedded key-value database that stores keys and values as byte arrays. The dicebot uses [Sled](https://sled.rs/). Sled is a fast, embedded key-value database that stores keys and values as byte arrays.
Why Sled? Why Sled?
* Mostly because I felt like it. * Mostly because I felt like it.
## High-Level Architecture Overview ## High-Level Architecture Overview
A simple overview of how the database code is organized. A simple overview of how the database code is organized.
### Database Code ### Database Code
A struct `Database` is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the `Database` instance has a `Variables` field and a `Migrations` field. The `Variables` type relates to user variables, and the `Migrations` field manages the database migration version in the database. A struct `Database` is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the `Database` instance has a `Variables` field and a `Migrations` field. The `Variables` type relates to user variables, and the `Migrations` field manages the database migration version in the database.
### Data Migrations ### Data Migrations
The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary). The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary).
Migrations are simply functions that receive the `Database` instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts. Migrations are simply functions that receive the `Database` instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts.
The dicebot currently has no support for automatically migrating Sled database file versions. The dicebot currently has no support for automatically migrating Sled database file versions.
## Guiding Principles ## Guiding Principles
Sled essentially operates as a bunch of `Map<[u8], [u8]>`s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines: Sled essentially operates as a bunch of `Map<[u8], [u8]>`s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines:
* Separate `Tree`s for different types of data. A `Tree` in Sled is an isolated keyspace. * Separate `Tree`s for different types of data. A `Tree` in Sled is an isolated keyspace.
* Strongly typed access to data, converted into `[u8]` keys by code private to the DB layer. * Strongly typed access to data, converted into `[u8]` keys by code private to the DB layer.
* Keys are UTF8 strings, sometimes separated by `0xfe` and `0xff` delimiter bytes. `0xfe` and `0xff` are not valid UTF8, which makes them useful delimiters. * Keys are UTF8 strings, sometimes separated by `0xfe` and `0xff` delimiter bytes. `0xfe` and `0xff` are not valid UTF8, which makes them useful delimiters.
### Trees, Keys, and Delimiters ### Trees, Keys, and Delimiters
Sled supports opening multiple `Tree` instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table. Sled supports opening multiple `Tree` instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table.
* In general, a single tree should store one kind of data. * In general, a single tree should store one kind of data.
* Ideally, the tree does not have more that one format used for keys. * Ideally, the tree does not have more that one format used for keys.
Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the `username`, `room ID`, and the actual name of the variable. The delimiters `0xff` and `0xfe` are used for this sub-categorization. Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the `username`, `room ID`, and the actual name of the variable. The delimiters `0xff` and `0xfe` are used for this sub-categorization.
* The `0xff` delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name). * The `0xff` delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name).
* The `0xfe` delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by `0xfe`. * The `0xfe` delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by `0xfe`.
A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the `scan_prefix` function allows finding all data that begins with that prefix. A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the `scan_prefix` function allows finding all data that begins with that prefix.
### Values ### Values
The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data. The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data.
* For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the `zerocopy` crate is preferred. * For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the `zerocopy` crate is preferred.
* For complex structs with arbitrary sizes (strings, lists, etc), the `bincode` crate is preferred. * For complex structs with arbitrary sizes (strings, lists, etc), the `bincode` crate is preferred.
Where possible, higher-level Sled APIs should be used. Batches, `compare_and_swap` and `fetch_and_update` are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity. Where possible, higher-level Sled APIs should be used. Batches, `compare_and_swap` and `fetch_and_update` are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity.
### Migrations ### Migrations
Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful: Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful:
* Isolate all functionality for the migration inside its function, or inside a module only the migration function can access. * Isolate all functionality for the migration inside its function, or inside a module only the migration function can access.
* Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API. * Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API.
* Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result. * Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result.
## Design: User Variables ## Design: User Variables
This documents the database design of user-defined variables. This documents the database design of user-defined variables.
### Database Schema ### Database Schema
User variables are implemented with two trees: User variables are implemented with two trees:
* `room_user_variables`: Key format `<username> 0xfe <room_id> 0xff <variable_name>`. Value type is `i32`. * `room_user_variables`: Key format `<username> 0xfe <room_id> 0xff <variable_name>`. Value type is `i32`.
* `room_user_variable_count`: Key format `<username> 0xfe <room_id>`. Value type is `i32`. * `room_user_variable_count`: Key format `<username> 0xfe <room_id>`. Value type is `i32`.
The `room_user_variables` tree contains the actual variables and their value, defined by the user. The `room_user_variable_count` tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally. The `room_user_variables` tree contains the actual variables and their value, defined by the user. The `room_user_variable_count` tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally.
## Design: Room State Management
This will document the room state management database design, once it's finished.
### Database Schema
TODO.