3.6 KiB
Misskey Safety Scan
A work-in-progress collection of utilities for analyzing content found on Misskey and the wider Fediverse, designed to help instance administrators make a plan of action on how to enforce their own rules and policies.
Currently, this repository consists of two bash scripts which serve as prototypes for a larger effort, which will be written in Typescript.
What Does This Do?
The primary purpose of these programs is to scan instances of the Fediverse for content that is often deemed inappropriate or illegal. It is another tool in the toolkit for admins like Fediblockhole, FediSeer, etc.
scan-federated-instances
: Scans the descriptions of all instances known to the local instance for inappropriate content or themes using a large language model.verify-scan
: Double checks an input CSV file that was generated by the scanner to remove false positives and negatives.
Configuring the AI Model
The scanning relies on the llama-guard3 model (or something that can
produce the same responses) for determining if an instance's
description is inappropriate or not. The aichat
tool is used to
invoke the large language model.
Refer to the aichat documentation for more information.
Currently, you must use llama-guard3.
Invoking the Commands
Instance Scanner:
- Instance URL: This should be the root URL of your Misskey instance.
- API Key: This is the
i
parameter included in API requests. Find it in the browser console. - Model Name: This is a model name from aichat. Something like
myollama:llama-guard3:8b
. Refer to aichat documentation for more.
scan-federated-instances https://social.example.com/ "APIKEY" modelname
Scan Verifier:
- CSV file: The CSV generated by the instance scanner.
- Model Name: This is a model name from aichat. Something like
myollama:llama-guard3:8b
. Refer to aichat documentation for more.
verify-scan scan-output.csv modelname
What to do with Output
The scan-output.csv
file will contain a list of instances that the
LLM deems to be promoting inappropriate, hateful, or illegal content.
From this point, what to do is up to the admin:
- Some will want to defederate completely from these instances.
- Some will want to silence them.
- Some will want to do nothing.
How Does It Work?
The scanner currently only communicates with the local Misskey instance, which means it does not put load on other servers (there is a curl HTTP OPTIONS check to determine if remote instances are up or not, though). The scanner uses the description of the instance found in the Misskey API response.
The descriptions of all alive remote instances are fed into aichat
and run against the llama-guard3
model. The model will output
whether or not it thinks the text is "safe," which means whether or
not the text violates its defined safety policies.
- In our case, we only care about things that would be considered
inappropriate or actually illegal, so the S6, S7, and S8 safety
codes are treated as
safe
by the scanner. - Otherwise, all the personal instances would be flagged as
unsafe
with code S7.
Dependencies
The following dependencies are required for running these programs:
- w3m (input sanitization)
- GNU parallel (executes
aichat
in parallel) - sed (input sanitization)
- aichat (properly configured)
- curl (API calls)
- jq (reading API responses)
Known Issues
There is currently a problem with the script not exiting correctly. To
terminate it early, use kill
from another terminal.