107 lines
3.6 KiB
Markdown
107 lines
3.6 KiB
Markdown
# Misskey Safety Scan
|
|
|
|
A work-in-progress collection of utilities for analyzing content found
|
|
on Misskey and the wider Fediverse, designed to help instance
|
|
administrators make a plan of action on how to enforce their own rules
|
|
and policies.
|
|
|
|
Currently, this repository consists of two bash scripts which serve as
|
|
prototypes for a larger effort, which will be written in Typescript.
|
|
|
|
## What Does This Do?
|
|
|
|
The primary purpose of these programs is to scan instances of the
|
|
Fediverse for content that is often deemed inappropriate or illegal.
|
|
It is another tool in the toolkit for admins like Fediblockhole,
|
|
FediSeer, etc.
|
|
- `scan-federated-instances`: Scans the **descriptions** of all
|
|
instances known to the local instance for inappropriate content or
|
|
themes using a large language model.
|
|
- `verify-scan`: Double checks an input CSV file that was generated
|
|
by the scanner to remove false positives and negatives.
|
|
|
|
## Configuring the AI Model
|
|
|
|
The scanning relies on the llama-guard3 model (or something that can
|
|
produce the same responses) for determining if an instance's
|
|
description is inappropriate or not. The `aichat` tool is used to
|
|
invoke the large language model.
|
|
|
|
Refer to the [aichat][1] documentation for more information.
|
|
|
|
**Currently, you must use llama-guard3**.
|
|
|
|
## Invoking the Commands
|
|
|
|
Instance Scanner:
|
|
- Instance URL: This should be the root URL of your Misskey instance.
|
|
- API Key: This is the `i` parameter included in API requests. Find
|
|
it in the browser console.
|
|
- Model Name: This is a model name from aichat. Something like
|
|
`myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more.
|
|
|
|
```
|
|
scan-federated-instances https://social.example.com/ "APIKEY" modelname
|
|
```
|
|
|
|
Scan Verifier:
|
|
- CSV file: The CSV generated by the instance scanner.
|
|
- Model Name: This is a model name from aichat. Something like
|
|
`myollama:llama-guard3:8b`. Refer to [aichat][1] documentation for more.
|
|
|
|
```
|
|
verify-scan scan-output.csv modelname
|
|
```
|
|
|
|
## What to do with Output
|
|
|
|
The `scan-output.csv` file will contain a list of instances that the
|
|
LLM deems to be promoting inappropriate, hateful, or illegal content.
|
|
From this point, what to do is up to the admin:
|
|
|
|
- Some will want to defederate completely from these instances.
|
|
- Some will want to silence them.
|
|
- Some will want to do nothing.
|
|
|
|
## How Does It Work?
|
|
|
|
The scanner currently only communicates with the local Misskey
|
|
instance, which means it does not put load on other servers (there is
|
|
a curl HTTP OPTIONS check to determine if remote instances are up or
|
|
not, though). The scanner uses the description of the instance found
|
|
in the Misskey API response.
|
|
|
|
The descriptions of all alive remote instances are fed into `aichat`
|
|
and run against the `llama-guard3` model. The model will output
|
|
whether or not it thinks the text is "safe," which means whether or
|
|
not the text violates [its defined safety policies][2].
|
|
|
|
- In our case, we only care about things that would be considered
|
|
inappropriate or actually illegal, so the S6, S7, and S8 safety
|
|
codes are treated as `safe` by the scanner.
|
|
- Otherwise, all the personal instances would be flagged as `unsafe`
|
|
with code S7.
|
|
|
|
## Dependencies
|
|
|
|
The following dependencies are required for running these programs:
|
|
- w3m (input sanitization)
|
|
- GNU parallel (executes `aichat` in parallel)
|
|
- sed (input sanitization)
|
|
- aichat (properly configured)
|
|
- curl (API calls)
|
|
- jq (reading API responses)
|
|
|
|
# Known Issues
|
|
|
|
There is currently a problem with the script not exiting correctly. To
|
|
terminate it early, use `kill` from another terminal.
|
|
|
|
# License
|
|
|
|
[AGPLv3 or later.][3]
|
|
|
|
[1]: https://github.com/sigoden/aichat
|
|
[2]: https://ollama.com/library/llama-guard3
|
|
[3]: ./LICENSE
|