A work-in-progress collection of utilities for analyzing content found on Misskey and the wider Fediverse, designed to help instance administrators make a plan of action on how to enforce their own rules and policies.
Go to file
projectmoon d6ca675aed Initial commit 2024-10-17 16:57:47 +02:00
.gitignore Initial commit 2024-10-17 16:57:47 +02:00
LICENSE Initial commit 2024-10-17 16:57:47 +02:00
README.md Initial commit 2024-10-17 16:57:47 +02:00
scan-federated-instances Initial commit 2024-10-17 16:57:47 +02:00
verify-safety Initial commit 2024-10-17 16:57:47 +02:00
verify-scan Initial commit 2024-10-17 16:57:47 +02:00

README.md

Misskey Safety Scan

A work-in-progress collection of utilities for analyzing content found on Misskey and the wider Fediverse, designed to help instance administrators make a plan of action on how to enforce their own rules and policies.

Currently, this repository consists of two bash scripts which serve as prototypes for a larger effort, which will be written in Typescript.

What Does This Do?

The primary purpose of these programs is to scan instances of the Fediverse for content that is often deemed inappropriate or illegal. It is another tool in the toolkit for admins like Fediblockhole, FediSeer, etc.

  • scan-federated-instances: Scans the descriptions of all instances known to the local instance for inappropriate content or themes using a large language model.
  • verify-scan: Double checks an input CSV file that was generated by the scanner to remove false positives and negatives.

Configuring the AI Model

The scanning relies on the llama-guard3 model (or something that can produce the same responses) for determining if an instance's description is inappropriate or not. The aichat tool is used to invoke the large language model.

Refer to the aichat documentation for more information.

Currently, you must use llama-guard3.

Invoking the Commands

Instance Scanner:

  • Instance URL: This should be the root URL of your Misskey instance.
  • API Key: This is the i parameter included in API requests. Find it in the browser console.
  • Model Name: This is a model name from aichat. Something like myollama:llama-guard3:8b. Refer to aichat documentation for more.
scan-federated-instances https://social.example.com/ "APIKEY" modelname

Scan Verifier:

  • CSV file: The CSV generated by the instance scanner.
  • Model Name: This is a model name from aichat. Something like myollama:llama-guard3:8b. Refer to aichat documentation for more.
verify-scan scan-output.csv modelname

What to do with Output

The scan-output.csv file will contain a list of instances that the LLM deems to be promoting inappropriate, hateful, or illegal content. From this point, what to do is up to the admin:

  • Some will want to defederate completely from these instances.
  • Some will want to silence them.
  • Some will want to do nothing.

How Does It Work?

The scanner currently only communicates with the local Misskey instance, which means it does not put load on other servers (there is a curl HTTP OPTIONS check to determine if remote instances are up or not, though). The scanner uses the description of the instance found in the Misskey API response.

The descriptions of all alive remote instances are fed into aichat and run against the llama-guard3 model. The model will output whether or not it thinks the text is "safe," which means whether or not the text violates its defined safety policies.

  • In our case, we only care about things that would be considered inappropriate or actually illegal, so the S6, S7, and S8 safety codes are treated as safe by the scanner.
  • Otherwise, all the personal instances would be flagged as unsafe with code S7.

Dependencies

The following dependencies are required for running these programs:

  • w3m (input sanitization)
  • GNU parallel (executes aichat in parallel)
  • sed (input sanitization)
  • aichat (properly configured)
  • curl (API calls)
  • jq (reading API responses)

Known Issues

There is currently a problem with the script not exiting correctly. To terminate it early, use kill from another terminal.

License

AGPLv3 or later.