2022-12-19 20:53:28 +00:00
|
|
|
# FediBlockHole
|
|
|
|
|
|
|
|
A tool for keeping a Mastodon instance blocklist synchronised with remote lists.
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
The broad design goal for FediBlockHole is to support pulling in a list of
|
|
|
|
blocklists from a set of trusted sources, merge them into a combined blocklist,
|
|
|
|
and then push that merged list to a set of managed instances.
|
|
|
|
|
|
|
|
Inspired by the way PiHole works for maintaining a set of blocklists of adtech
|
|
|
|
domains.
|
|
|
|
|
|
|
|
Mastodon admins can choose who they think maintain quality lists and subscribe
|
|
|
|
to them, helping to distribute the load for maintaining blocklists among a
|
|
|
|
community of people. Control ultimately rests with the admins themselves so they
|
|
|
|
can outsource as much, or as little, of the effort to others as they deem
|
|
|
|
appropriate.
|
|
|
|
|
2022-12-19 20:53:28 +00:00
|
|
|
## Features
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
### Blocklist Sources
|
|
|
|
|
|
|
|
- Read domain block lists from other instances via the Mastodon API.
|
|
|
|
- Supports both public lists (no auth required) and 'admin' lists requiring
|
|
|
|
authentication to an instance.
|
|
|
|
- Read domain block lists from arbitrary URLs, including local files.
|
|
|
|
- Supports CSV and JSON format blocklists
|
|
|
|
- Supports RapidBlock CSV and JSON format blocklists
|
|
|
|
|
|
|
|
### Blocklist Export/Push
|
|
|
|
|
|
|
|
- Push a merged blocklist to a set of Mastodon instances.
|
|
|
|
- Export per-source, unmerged block lists to local files, in CSV format.
|
|
|
|
- Export merged blocklists to local files, in CSV format.
|
2022-12-20 06:24:56 +00:00
|
|
|
- Read block lists from multiple remote instances
|
|
|
|
- Read block lists from multiple URLs, including local files
|
|
|
|
- Write a unified block list to a local CSV file
|
|
|
|
- Push unified blocklist updates to multiple remote instances
|
|
|
|
- Control import and export fields
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
### Flexible Configuration
|
|
|
|
|
|
|
|
- Provides (hopefully) sensible defaults to minimise first-time setup.
|
|
|
|
- Global and fine-grained configuration options available for those complex situations that crop up sometimes.
|
|
|
|
|
2022-12-19 20:53:28 +00:00
|
|
|
## Installing
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
Installable using `pip`.
|
|
|
|
|
|
|
|
```
|
|
|
|
python3 -m pip install fediblockhole
|
|
|
|
```
|
2023-01-09 22:00:15 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
Install from source by cloning the repo, `cd fediblockhole` and run:
|
2023-01-09 22:00:15 +00:00
|
|
|
|
|
|
|
```
|
|
|
|
python3 -m pip install .
|
|
|
|
```
|
|
|
|
|
|
|
|
Installation adds a commandline tool: `fediblock-sync`
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
Instance admins who want to use this tool for their instance will need to add an
|
|
|
|
Application at `https://<instance-domain>/settings/applications/` so they can
|
|
|
|
authorize the tool to create and update domain blocks with an OAuth token.
|
2023-01-09 22:00:15 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
More on authorization by token below.
|
2022-12-19 20:53:28 +00:00
|
|
|
|
|
|
|
### Reading remote instance blocklists
|
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
If a remote instance makes its domain blocks public, you don't need
|
|
|
|
a token to read them.
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
If a remote instance only shows its domain blocks to local accounts
|
|
|
|
you'll need to have a token with `read:blocks` authorization set up.
|
|
|
|
If you have an account on that instance, you can get a token by setting up a new
|
|
|
|
Application at `https://<instance-domain>/settings/applications/`.
|
|
|
|
|
|
|
|
To read admin blocks from a remote instance, you'll need to ask the instance
|
|
|
|
admin to add a new Application at
|
|
|
|
`https://<instance-domain>/settings/applications/` and then tell you the access
|
|
|
|
token.
|
|
|
|
|
|
|
|
The application needs the `admin:read:domain_blocks` OAuth scope, but
|
|
|
|
unfortunately this scope isn't available in the current application screen
|
|
|
|
(v4.0.2 of Mastodon at time of writing). There is a way to do it with scopes,
|
|
|
|
but it's really dangerous, so I'm not going to tell you what it is here.
|
2022-12-19 20:53:28 +00:00
|
|
|
|
|
|
|
A better way is to ask the instance admin to connect to the PostgreSQL database
|
|
|
|
and add the scope there, like this:
|
|
|
|
|
|
|
|
```
|
|
|
|
UPDATE oauth_access_tokens
|
|
|
|
SET scopes='admin:read:domain_blocks'
|
|
|
|
WHERE token='<your_app_token>';
|
|
|
|
```
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
When that's done, FediBlockHole should be able to use its token to read domain
|
|
|
|
blocks via the API.
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-13 20:08:00 +00:00
|
|
|
Alternately, you could ask the remote instance admin to set up FediBlockHole and
|
|
|
|
use it to dump out a CSV blocklist from their instance and then put it somewhere
|
|
|
|
trusted parties can read it. Then you can define the blocklist as a URL source,
|
|
|
|
as explained below.
|
|
|
|
|
2022-12-19 20:53:28 +00:00
|
|
|
### Writing instance blocklists
|
|
|
|
|
2023-01-13 20:08:00 +00:00
|
|
|
To write domain blocks into an instance requires both the `admin:read` and
|
|
|
|
`admin:write:domain_blocks` OAuth scopes. The `read` scope is used to read the
|
|
|
|
current list of domain blocks so we update ones that already exist, rather than
|
|
|
|
trying to add all new ones and clutter up the instance. It's also used to check
|
|
|
|
if the instance has any accounts that follow accounts on a domain that is about
|
|
|
|
to get `suspend`ed and automatically drop the block severity to `silence` level
|
|
|
|
so people have time to migrate accounts before a full defederation takes effect.
|
|
|
|
|
|
|
|
You can add `admin:read` scope in the application admin screen. Please be aware
|
|
|
|
that this grants full read access to all information in the instance to the
|
|
|
|
application token, so make sure you keep it a secret. At least remove
|
|
|
|
world-readable permission to any config file you put it in, e.g.:
|
|
|
|
|
|
|
|
```
|
|
|
|
chmod o-r <configfile>
|
|
|
|
```
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-13 20:08:00 +00:00
|
|
|
You can also grant full `admin:write` scope to the application, but if you'd
|
|
|
|
prefer to keep things more tightly secured you'll need to use SQL to set the
|
|
|
|
scopes in the database:
|
2022-12-19 20:53:28 +00:00
|
|
|
|
|
|
|
```
|
|
|
|
UPDATE oauth_access_tokens
|
2023-01-13 20:08:00 +00:00
|
|
|
SET scopes='admin:read admin:write:domain_blocks'
|
2022-12-19 20:53:28 +00:00
|
|
|
WHERE token='<your_app_token>';
|
|
|
|
```
|
|
|
|
|
|
|
|
When that's done, FediBlockHole should be able to use its token to authorise
|
|
|
|
adding or updating domain blocks via the API.
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
## Using the tool
|
|
|
|
|
|
|
|
Run the tool like this:
|
|
|
|
|
|
|
|
```
|
|
|
|
fediblock-sync -c <configfile_path>
|
|
|
|
```
|
|
|
|
|
|
|
|
If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't
|
|
|
|
need to pass in the config file path.
|
|
|
|
|
|
|
|
For a list of possible configuration options, check the `--help`.
|
|
|
|
|
|
|
|
You can also read the heavily commented sample configuration file in the repo at
|
|
|
|
[etc/sample.fediblockhole.conf.toml](https://github.com/eigenmagic/fediblockhole/blob/main/etc/sample.fediblockhole.conf.toml).
|
|
|
|
|
2022-12-19 20:53:28 +00:00
|
|
|
## Configuring
|
|
|
|
|
|
|
|
Once you have your applications and tokens and scopes set up, create a
|
|
|
|
configuration file for FediBlockHole to use. You can put it anywhere and use the
|
|
|
|
`-c <configfile>` commandline parameter to tell FediBlockHole where it is.
|
|
|
|
|
|
|
|
Or you can use the default location of `/etc/default/fediblockhole.conf.toml`.
|
|
|
|
|
|
|
|
As the filename suggests, FediBlockHole uses TOML syntax.
|
|
|
|
|
2023-01-14 00:09:38 +00:00
|
|
|
There are 4 key sections:
|
2023-01-09 05:51:30 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
1. `blocklist_urls_sources`: A list of URLs to read blocklists from
|
|
|
|
1. `blocklist_instance_sources`: A list of Mastodon instances to read blocklists from via API
|
|
|
|
1. `blocklist_instance_destinations`: A list of Mastodon instances to write blocklists to via API
|
2023-01-14 00:09:38 +00:00
|
|
|
1. `allowlist_url_sources`: A list of URLs to read allowlists from
|
2023-01-11 23:41:01 +00:00
|
|
|
|
|
|
|
More detail on configuring the tool is provided below.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
|
|
|
### URL sources
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
The URL sources is a list of URLs to fetch blocklists from.
|
|
|
|
|
|
|
|
Supported formats are currently:
|
|
|
|
|
|
|
|
- Comma-Separated Values (CSV)
|
|
|
|
- JSON
|
|
|
|
- RapidBlock CSV
|
|
|
|
- RapidBlock JSON
|
|
|
|
|
|
|
|
Blocklists must provide a `domain` field, and should provide a `severity` field.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
`domain` is the domain name of the instance to be blocked/limited.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
`severity` is the severity level of the block/limit. Supported values are: `noop`, `silence`, and `suspend`.
|
|
|
|
|
|
|
|
Optional fields that the tool understands are `public_comment`, `private_comment`, `reject_media`, `reject_reports`, and `obfuscate`.
|
|
|
|
|
|
|
|
#### CSV format
|
|
|
|
|
|
|
|
A CSV format blocklist must contain a header row with at least a `domain` and `severity` field.
|
|
|
|
|
|
|
|
Optional fields, as listed about, may also be included.
|
|
|
|
|
|
|
|
#### JSON format
|
|
|
|
|
|
|
|
JSON is also supported. It uses the same format as the JSON returned from the Mastodon API.
|
|
|
|
|
|
|
|
This is a list of dictionaries, with at minimum a `domain` field, and preferably
|
|
|
|
a `severity` field. The other optional fields are, well, optional.
|
|
|
|
|
|
|
|
#### RapidBlock CSV format
|
|
|
|
|
|
|
|
The RapidBlock CSV format has no header and a single field, so it's not
|
|
|
|
_strictly_ a CSV file as there are no commas separating values. It is basically
|
|
|
|
just a list of domains to block, separated by '\r\n'.
|
|
|
|
|
|
|
|
When using this format, the tool assumes the `severity` level is `suspend`.
|
|
|
|
|
|
|
|
#### RapidBlock JSON format
|
|
|
|
|
|
|
|
The RapidBlock JSON format provides more detailed information about domain
|
|
|
|
blocks, but is still somewhat limited.
|
|
|
|
|
|
|
|
It has a single `isBlocked` flag indicating if a domain should be blocked or
|
|
|
|
not. There is no support for the 'silence' block level.
|
|
|
|
|
|
|
|
There is no support for 'reject_media' or 'reject_reports' or 'obfuscate'.
|
|
|
|
|
|
|
|
All comments are public, by virtue of the public nature of RapidBlock.
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
### Instance sources
|
2022-12-19 20:53:28 +00:00
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
The tool can also read domain_blocks from instances directly.
|
|
|
|
|
|
|
|
The configuration is a list of dictionaries of the form:
|
2022-12-19 20:53:28 +00:00
|
|
|
```
|
2023-01-09 05:51:30 +00:00
|
|
|
{ domain = '<domain_name>', token = '<BearerToken>', admin = false }
|
2022-12-19 20:53:28 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
The `domain` is the fully-qualified domain name of the API host for an instance
|
2023-01-11 23:41:01 +00:00
|
|
|
you want to read domain blocks from.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
|
|
|
The `token` is an optional OAuth token for the application that's configured in
|
2023-01-11 23:41:01 +00:00
|
|
|
the instance to allow you to read domain blocks, as discussed above.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
|
|
|
`admin` is an optional field that tells the tool to use the more detailed admin
|
|
|
|
API endpoint for domain_blocks, rather than the more public API endpoint that
|
|
|
|
doesn't provide as much detail. You will need a `token` that's been configured to
|
|
|
|
permit access to the admin domain_blocks scope, as detailed above.
|
|
|
|
|
|
|
|
### Instance destinations
|
|
|
|
|
|
|
|
The tool supports pushing a unified blocklist to multiple instances.
|
|
|
|
|
|
|
|
Configure the list of instances you want to push your blocklist to in the
|
|
|
|
`blocklist_instance_detinations` list. Each entry is of the form:
|
|
|
|
|
|
|
|
```
|
2023-01-11 23:41:01 +00:00
|
|
|
{ domain = '<domain_name>', token = '<BearerToken>', import_fields = ['public_comment'], max_severity = 'suspend', max_followed_severity = 'suspend' }
|
2023-01-09 05:51:30 +00:00
|
|
|
```
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
The fields `domain` and `token` are required.
|
|
|
|
|
|
|
|
The fields `max_followed_severity` and `import_fields` are optional.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
|
|
|
The `domain` is the hostname of the instance you want to push to. The `token` is
|
|
|
|
an application token with both `admin:read:domain_blocks` and
|
|
|
|
`admin:write:domain_blocks` authorization.
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
The optional `import_fields` setting allows you to restrict which fields are
|
|
|
|
imported from each instance. If you want to import the `reject_reports` settings
|
|
|
|
from one instance, but no others, you can use the `import_fields` setting to do
|
|
|
|
it. **Note:** The `domain` and `severity` fields are always imported.
|
|
|
|
|
|
|
|
The optional `max_severity` setting limits the maximum severity you will allow a
|
|
|
|
remote blocklist to set. This helps you import a list from a remote instance but
|
|
|
|
only at the `silence` level, even if that remote instance has a block at
|
|
|
|
`suspend` level. If not set, defaults to `suspend`.
|
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
The optional `max_followed_severity` setting sets a per-instance limit on the
|
|
|
|
severity of a domain_block if there are accounts on the instance that follow
|
|
|
|
accounts on the domain to be blocked. If `max_followed_severity` isn't set, it
|
2023-01-11 23:41:01 +00:00
|
|
|
defaults to `silence`.
|
2023-01-09 05:51:30 +00:00
|
|
|
|
|
|
|
This setting exists to give people time to move off an instance that is about to
|
|
|
|
be defederated and bring their followers from your instance with them. Without
|
2023-01-11 23:41:01 +00:00
|
|
|
it, if a new `suspend` block appears in any of the blocklists you subscribe to (or
|
|
|
|
a block level increases from `silence` to `suspend`) and you're using the default
|
2023-01-09 05:51:30 +00:00
|
|
|
`max` mergeplan, the tool would immediately suspend the instance, cutting
|
|
|
|
everyone on the blocked instance off from their existing followers on your
|
|
|
|
instance, even if they move to a new instance. If you actually want that
|
|
|
|
outcome, you can set `max_followed_severity = 'suspend'` and use the `max`
|
|
|
|
mergeplan.
|
|
|
|
|
2023-01-11 23:41:01 +00:00
|
|
|
Once the follow count drops to 0 on your instance, the tool will automatically
|
|
|
|
use the highest severity it finds again (if you're using the `max` mergeplan).
|
2022-12-20 06:24:56 +00:00
|
|
|
|
2023-01-14 00:09:38 +00:00
|
|
|
### Allowlists
|
|
|
|
|
2023-01-15 02:38:07 +00:00
|
|
|
Sometimes you might want to completely ignore the blocklist definitions for
|
|
|
|
certain domains. That's what allowlists are for.
|
2023-01-14 00:09:38 +00:00
|
|
|
|
2023-01-15 02:38:07 +00:00
|
|
|
Allowlists remove any domain in the list from the merged list of blocks before
|
|
|
|
the merged list is saved out to a file or pushed to any instance.
|
2023-01-14 00:09:38 +00:00
|
|
|
|
2023-01-15 02:38:07 +00:00
|
|
|
Allowlists can be in any format supported by `blocklist_urls_sources` but ignore
|
|
|
|
all fields that aren't `domain`.
|
2023-01-14 00:09:38 +00:00
|
|
|
|
|
|
|
You can also allow domains on the commandline by using the `-A` or `--allow`
|
|
|
|
flag and providing the domain name to allow. You can use the flag multiple
|
|
|
|
times to allow multiple domains.
|
|
|
|
|
2023-01-15 02:38:07 +00:00
|
|
|
It is probably wise to include your own instance domain in an allowlist so you
|
|
|
|
don't accidentally defederate from yourself.
|
2023-01-14 00:09:38 +00:00
|
|
|
|
2022-12-20 06:24:56 +00:00
|
|
|
## More advanced configuration
|
|
|
|
|
|
|
|
For a list of possible configuration options, check the `--help` and read the
|
|
|
|
sample configuration file in `etc/sample.fediblockhole.conf.toml`.
|
|
|
|
|
2023-01-09 05:51:30 +00:00
|
|
|
### save_intermediate
|
2022-12-20 06:24:56 +00:00
|
|
|
|
|
|
|
This option tells the tool to save the unmerged blocklists it fetches from
|
|
|
|
remote instances and URLs into separate files. This is handy for debugging, or
|
|
|
|
just to have a non-unified set of blocklist files.
|
|
|
|
|
|
|
|
Works with the `savedir` setting to control where to save the files.
|
|
|
|
|
|
|
|
These are parsed blocklists, not the raw data, and so will be affected by `import_fields`.
|
|
|
|
|
|
|
|
The filename is based on the URL or domain used so you can tell where each list came from.
|
|
|
|
|
|
|
|
### savedir
|
|
|
|
|
|
|
|
Sets where to save intermediate blocklist files. Defaults to `/tmp`.
|
|
|
|
|
|
|
|
### no_push_instance
|
|
|
|
|
|
|
|
Defaults to False.
|
|
|
|
|
|
|
|
When set, the tool won't actually try to push the unified blocklist to any
|
|
|
|
configured instances.
|
|
|
|
|
|
|
|
If you want to see what the tool would try to do, but not actually apply any
|
|
|
|
updates, use `--dryrun`.
|
|
|
|
|
|
|
|
### no_fetch_url
|
|
|
|
|
|
|
|
Skip the fetching of blocklists from any URLs that are configured.
|
|
|
|
|
|
|
|
### no_fetch_instance
|
|
|
|
|
|
|
|
Skip the fetching of blocklists from any remote instances that are configured.
|
|
|
|
|
|
|
|
### mergeplan
|
|
|
|
|
|
|
|
If two (or more) blocklists define blocks for the same domain, but they're
|
|
|
|
different, `mergeplan` tells the tool how to resolve the conflict.
|
|
|
|
|
|
|
|
`max` is the default. It uses the _highest_ severity block it finds as the one
|
|
|
|
that should be used in the unified blocklist.
|
|
|
|
|
|
|
|
`min` does the opposite. It uses the _lowest_ severity block it finds as the one
|
|
|
|
to use in the unified blocklist.
|
|
|
|
|
|
|
|
A full discussion of severities is beyond the scope of this README, but here is
|
|
|
|
a quick overview of how it works for this tool.
|
|
|
|
|
|
|
|
The severities are:
|
|
|
|
|
|
|
|
- **noop**, level 0: This is essentially an 'unblock' but you can include a
|
|
|
|
comment.
|
|
|
|
- **silence**, level 1: A silence adds friction to federation with an instance.
|
|
|
|
- **suspend**, level 2: A full defederation with the instance.
|
|
|
|
|
|
|
|
With `mergeplan` set to `max`, _silence_ would take precedence over _noop_, and
|
|
|
|
_suspend_ would take precedence over both.
|
|
|
|
|
|
|
|
With `mergeplan` set to `min`, _silence_ would take precedence over _suspend_,
|
|
|
|
and _noop_ would take precedence over both.
|
|
|
|
|
|
|
|
You would want to use `max` to ensure that you always block with whichever your
|
|
|
|
harshest fellow admin thinks should happen.
|
|
|
|
|
|
|
|
You would want to use `min` to ensure that your blocks do what your most lenient
|
|
|
|
fellow admin thinks should happen.
|
|
|
|
|
|
|
|
### import_fields
|
|
|
|
|
|
|
|
`import_fields` controls which fields will be imported from remote
|
|
|
|
instances and URL blocklists, and which fields are pushed to instances from the
|
|
|
|
unified blocklist.
|
|
|
|
|
|
|
|
The fields `domain` and `severity` are always included, so only define extra
|
|
|
|
fields, if you want them.
|
|
|
|
|
|
|
|
You can't export fields you haven't imported, so `export_fields` should be a
|
|
|
|
subset of `import_fields`, but you can run the tool multiple times. You could,
|
|
|
|
for example, include lots of fields for an initial import to build up a
|
|
|
|
comprehensive list for export, combined with the `--no-push-instances` option so
|
|
|
|
you don't actually apply the full list to anywhere.
|
|
|
|
|
|
|
|
Then you could use a different set of options when importing so you have all the
|
|
|
|
detail in a file, but only push `public_comment` to instances.
|
|
|
|
|
|
|
|
### export_fields
|
|
|
|
|
|
|
|
`export_fields` controls which fields will get saved to the unified blocklist
|
|
|
|
file, if you export one.
|
|
|
|
|
|
|
|
The fields `domain` and `severity` are always included, so only define extra
|
|
|
|
fields, if you want them.
|