misskey-safety-scan/README.md

107 lines
3.6 KiB
Markdown

# Misskey Safety Scan
A work-in-progress collection of utilities for analyzing content found
on Misskey and the wider Fediverse, designed to help instance
administrators make a plan of action on how to enforce their own rules
and policies.
Currently, this repository consists of two bash scripts which serve as
prototypes for a larger effort, which will be written in Typescript.
## What Does This Do?
The primary purpose of these programs is to scan instances of the
Fediverse for content that is often deemed inappropriate or illegal.
It is another tool in the toolkit for admins like Fediblockhole,
FediSeer, etc.
- `scan-federated-instances`: Scans the **descriptions** of all
instances known to the local instance for inappropriate content or
themes using a large language model.
- `verify-scan`: Double checks an input CSV file that was generated
by the scanner to remove false positives and negatives.
## Configuring the AI Model
The scanning relies on the llama-guard3 model (or something that can
produce the same responses) for determining if an instance's
description is inappropriate or not. The `aichat` tool is used to
invoke the large language model.
Refer to the [aichat][1] documentation for more information.
**Currently, you must use llama-guard3**.
## Invoking the Commands
Instance Scanner:
- Instance URL: This should be the root URL of your Misskey instance.
- API Key: This is the `i` parameter included in API requests. Find
it in the browser console.
- Model Name: This is a model name from aichat. Something like
`myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more.
```
scan-federated-instances https://social.example.com/ "APIKEY" modelname
```
Scan Verifier:
- CSV file: The CSV generated by the instance scanner.
- Model Name: This is a model name from aichat. Something like
`myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more.
```
verify-scan scan-output.csv modelname
```
## What to do with Output
The `scan-output.csv` file will contain a list of instances that the
LLM deems to be promoting inappropriate, hateful, or illegal content.
From this point, what to do is up to the admin:
- Some will want to defederate completely from these instances.
- Some will want to silence them.
- Some will want to do nothing.
## How Does It Work?
The scanner currently only communicates with the local Misskey
instance, which means it does not put load on other servers (there is
a curl HTTP OPTIONS check to determine if remote instances are up or
not, though). The scanner uses the description of the instance found
in the Misskey API response.
The descriptions of all alive remote instances are fed into `aichat`
and run against the `llama-guard3` model. The model will output
whether or not it thinks the text is "safe," which means whether or
not the text violates [its defined safety policies][2].
- In our case, we only care about things that would be considered
inappropriate or actually illegal, so the S6, S7, and S8 safety
codes are treated as `safe` by the scanner.
- Otherwise, all the personal instances would be flagged as `unsafe`
with code S7.
## Dependencies
The following dependencies are required for running these programs:
- w3m (input sanitization)
- GNU parallel (executes `aichat` in parallel)
- sed (input sanitization)
- aichat (properly configured)
- curl (API calls)
- jq (reading API responses)
# Known Issues
There is currently a problem with the script not exiting correctly. To
terminate it early, use `kill` from another terminal.
# License
[AGPLv3 or later.][3]
[1]: https://github.com/sigoden/aichat
[2]: https://ollama.com/library/llama-guard3
[3]: ./LICENSE