diff --git a/Database-Design.md b/Database-Design.md index ac7a37e..479b7ca 100644 --- a/Database-Design.md +++ b/Database-Design.md @@ -1,74 +1,82 @@ -# Database Design - -The dicebot uses [Sled](https://sled.rs/). Sled is a fast, embedded key-value database that stores keys and values as byte arrays. - -Why Sled? - -* Mostly because I felt like it. - -## High-Level Architecture Overview - -A simple overview of how the database code is organized. - -### Database Code - -A struct `Database` is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the `Database` instance has a `Variables` field and a `Migrations` field. The `Variables` type relates to user variables, and the `Migrations` field manages the database migration version in the database. - -### Data Migrations - -The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary). - -Migrations are simply functions that receive the `Database` instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts. - -The dicebot currently has no support for automatically migrating Sled database file versions. - -## Guiding Principles - -Sled essentially operates as a bunch of `Map<[u8], [u8]>`s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines: - -* Separate `Tree`s for different types of data. A `Tree` in Sled is an isolated keyspace. -* Strongly typed access to data, converted into `[u8]` keys by code private to the DB layer. -* Keys are UTF8 strings, sometimes separated by `0xfe` and `0xff` delimiter bytes. `0xfe` and `0xff` are not valid UTF8, which makes them useful delimiters. - -### Trees, Keys, and Delimiters - -Sled supports opening multiple `Tree` instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table. - -* In general, a single tree should store one kind of data. -* Ideally, the tree does not have more that one format used for keys. - -Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the `username`, `room ID`, and the actual name of the variable. The delimiters `0xff` and `0xfe` are used for this sub-categorization. - -* The `0xff` delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name). -* The `0xfe` delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by `0xfe`. - -A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the `scan_prefix` function allows finding all data that begins with that prefix. - -### Values - -The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data. - -* For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the `zerocopy` crate is preferred. -* For complex structs with arbitrary sizes (strings, lists, etc), the `bincode` crate is preferred. - -Where possible, higher-level Sled APIs should be used. Batches, `compare_and_swap` and `fetch_and_update` are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity. - -### Migrations - -Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful: - -* Isolate all functionality for the migration inside its function, or inside a module only the migration function can access. -* Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API. -* Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result. - -## Design: User Variables - -This documents the database design of user-defined variables. - -### Database Schema -User variables are implemented with two trees: - -* `room_user_variables`: Key format ` 0xfe 0xff `. Value type is `i32`. -* `room_user_variable_count`: Key format ` 0xfe `. Value type is `i32`. - -The `room_user_variables` tree contains the actual variables and their value, defined by the user. The `room_user_variable_count` tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally. \ No newline at end of file +# Database Design + +The dicebot uses [Sled](https://sled.rs/). Sled is a fast, embedded key-value database that stores keys and values as byte arrays. + +Why Sled? + +* Mostly because I felt like it. + +## High-Level Architecture Overview + +A simple overview of how the database code is organized. + +### Database Code + +A struct `Database` is the entrypoint to all database functionality. The struct has fields, each their own sub-type, which expose the specific functionality. For example, the `Database` instance has a `Variables` field and a `Migrations` field. The `Variables` type relates to user variables, and the `Migrations` field manages the database migration version in the database. + +### Data Migrations + +The dicebot has very simple (and frankly dangerous) support for migrating data. The database keeps track of its current migration version, and when the application starts up, it checks to see if it needs to upgrade the format of data. The current migration version is stored in the config (currently hardcoded into the binary). + +Migrations are simply functions that receive the `Database` instance, and can execute whatever they need to do to alter the data in the database. Migration functions are idempotent and make their changes in one or more atomic steps. This is necessary because the database version is updated in a transaction separate from any transactions the migration starts. + +The dicebot currently has no support for automatically migrating Sled database file versions. + +## Guiding Principles + +Sled essentially operates as a bunch of `Map<[u8], [u8]>`s (that is, a map of byte slice keys and byte slice values), which means both keys and values can be pretty much any arbitrary type. To keep things simple, we follow these guidelines: + +* Separate `Tree`s for different types of data. A `Tree` in Sled is an isolated keyspace. +* Strongly typed access to data, converted into `[u8]` keys by code private to the DB layer. +* Keys are UTF8 strings, sometimes separated by `0xfe` and `0xff` delimiter bytes. `0xfe` and `0xff` are not valid UTF8, which makes them useful delimiters. + +### Trees, Keys, and Delimiters + +Sled supports opening multiple `Tree` instances, each which acts as an isolated set of key-value pairs. We use trees to categorize data much like a relational SQL table. + +* In general, a single tree should store one kind of data. +* Ideally, the tree does not have more that one format used for keys. + +Keys in a tree are how data is queried, and thus should make sense for the data type. A key often requires some levels of sub-categorization. For example, user variables are defined per room. Thus, the key for a user variable is composed of the `username`, `room ID`, and the actual name of the variable. The delimiters `0xff` and `0xfe` are used for this sub-categorization. + +* The `0xff` delimiter is used to split logically separate parts of the key. In the case of user variables, this separates the where (username + room ID) from the what (variable name). +* The `0xfe` delimiter is used to split related parts of the key into more narrow categories. For user variables, the username and room ID are split by `0xfe`. + +A key format should be designed with how the data will be queried in mind. It usually does not matter when looking up a single value, but for scanning multiple keys, it's important that the key be designed properly. Using delimiters enables clever use of Sled's API. A key can be partially crafted up to the delimiter, and the `scan_prefix` function allows finding all data that begins with that prefix. + +### Values + +The value stored in a tree can be anything, including serialized structs. Following guidelines above, one tree should store one kind of data. + +* For types with a size known at compile time (simple types or anything whose size in memory is not variable), zero-copy serialization using the `zerocopy` crate is preferred. +* For complex structs with arbitrary sizes (strings, lists, etc), the `bincode` crate is preferred. + +Where possible, higher-level Sled APIs should be used. Batches, `compare_and_swap` and `fetch_and_update` are both very useful. In cases where this is not possible, transactions should be used to preserve atomicity. + +### Migrations + +Designing database migrations is difficult when the database doesn't really have anything resembling a query language, and the data stored is completely arbitrary! That said, some basic rules make them less painful: + +* Isolate all functionality for the migration inside its function, or inside a module only the migration function can access. +* Avoid use of any APIs from the dicebot database layer. These APIs change to match the latest requirements of the application. The migration should be able to craft and decode keys in a way completely decoupled from the API. +* Migrations must be idempotent. If the migration crashes halfway through execution, or the DB version update does not commit, the migration must be able to run again and produce the same result. + +## Design: User Variables + +This documents the database design of user-defined variables. + +### Database Schema +User variables are implemented with two trees: + +* `room_user_variables`: Key format ` 0xfe 0xff `. Value type is `i32`. +* `room_user_variable_count`: Key format ` 0xfe `. Value type is `i32`. + +The `room_user_variables` tree contains the actual variables and their value, defined by the user. The `room_user_variable_count` tree keeps track of how many variables a user has defined on a per room basis. APIs atomically update both trees transactionally. + +## Design: Room State Management + +This will document the room state management database design, once it's finished. + +### Database Schema + +TODO. \ No newline at end of file