# Misskey Safety Scan A work-in-progress collection of utilities for analyzing content found on Misskey and the wider Fediverse, designed to help instance administrators make a plan of action on how to enforce their own rules and policies. Currently, this repository consists of two bash scripts which serve as prototypes for a larger effort, which will be written in Typescript. ## What Does This Do? The primary purpose of these programs is to scan instances of the Fediverse for content that is often deemed inappropriate or illegal. It is another tool in the toolkit for admins like Fediblockhole, FediSeer, etc. - `scan-federated-instances`: Scans the **descriptions** of all instances known to the local instance for inappropriate content or themes using a large language model. - `verify-scan`: Double checks an input CSV file that was generated by the scanner to remove false positives and negatives. ## Configuring the AI Model The scanning relies on the llama-guard3 model (or something that can produce the same responses) for determining if an instance's description is inappropriate or not. The `aichat` tool is used to invoke the large language model. Refer to the [aichat][1] documentation for more information. **Currently, you must use llama-guard3**. ## Invoking the Commands Instance Scanner: - Instance URL: This should be the root URL of your Misskey instance. - API Key: This is the `i` parameter included in API requests. Find it in the browser console. - Model Name: This is a model name from aichat. Something like `myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more. ``` scan-federated-instances https://social.example.com/ "APIKEY" modelname ``` Scan Verifier: - CSV file: The CSV generated by the instance scanner. - Model Name: This is a model name from aichat. Something like `myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more. ``` verify-scan scan-output.csv modelname ``` ## What to do with Output The `scan-output.csv` file will contain a list of instances that the LLM deems to be promoting inappropriate, hateful, or illegal content. From this point, what to do is up to the admin: - Some will want to defederate completely from these instances. - Some will want to silence them. - Some will want to do nothing. ## How Does It Work? The scanner currently only communicates with the local Misskey instance, which means it does not put load on other servers (there is a curl HTTP OPTIONS check to determine if remote instances are up or not, though). The scanner uses the description of the instance found in the Misskey API response. The descriptions of all alive remote instances are fed into `aichat` and run against the `llama-guard3` model. The model will output whether or not it thinks the text is "safe," which means whether or not the text violates [its defined safety policies][2]. - In our case, we only care about things that would be considered inappropriate or actually illegal, so the S6, S7, and S8 safety codes are treated as `safe` by the scanner. - Otherwise, all the personal instances would be flagged as `unsafe` with code S7. ## Dependencies The following dependencies are required for running these programs: - w3m (input sanitization) - GNU parallel (executes `aichat` in parallel) - sed (input sanitization) - aichat (properly configured) - curl (API calls) - jq (reading API responses) # Known Issues There is currently a problem with the script not exiting correctly. To terminate it early, use `kill` from another terminal. # License [AGPLv3 or later.][3] [1]: https://github.com/sigoden/aichat [2]: https://ollama.com/library/llama-guard3 [3]: ./LICENSE