Merge pull request #19 from eigenmagic/release-v0.4.0

Release v0.4.0
This commit is contained in:
Justin Warren 2023-01-13 19:34:43 +11:00 committed by GitHub
commit 3d39ae7e87
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 178 additions and 39 deletions

View File

@ -8,6 +8,44 @@ This project uses [Semantic Versioning] and generally follows the conventions of
Important planned changes not yet bundled up will be listed here.
## [0.4.0] - 2023-01-13
Substantial changes to better support multiple blocklist formats
### Added
- Added support for RapidBlock blocklists, both CSV and JSON formats. (327a44d)
- Added support for per-instance-source import_fields. (327a44d)
- Updated sample config to include new formats. (327a44d)
- A BlockSeverity of 'suspend' implies reject_media and reject_reports. (327a44d)
- Added ability to limit max severity per-URL source. (10011a5)
- Added boolean fields like 'reject_reports' to mergeplan handling. (66f0373)
- Added tests for boolean merge situations. (66f0373)
- Various other test cases added.
### Changed
- Refactored to add a DomainBlock object. (10011a5)
- Refactored to use a BlockParser structure. (10011a5)
- Improved method for checking if changes are needed. (10011a5)
- Refactored fetch from URLs and instances. (327a44d)
- Improved check_followed_severity() behaviour. (327a44d)
- Changed API delay to be in calls per hour. (327a44d)
- Improved comment merging. (0a6eec4)
- Clarified logic in apply_mergeplan() for boolean fields. (66f0373)
- Updated README documentation. (ee9625d)
- Aligned API call rate limit with server default. (55dad3f)
### Removed
- Removed redundant global vars. (327a44d)
### Fixed
- Fixed bug in severity change detection. (e0d40b5)
- Fix DomainBlock.id usage during __iter__() (a718af5)
-
## [0.3.0] - 2023-01-11
### Added

166
README.md
View File

@ -2,19 +2,55 @@
A tool for keeping a Mastodon instance blocklist synchronised with remote lists.
The broad design goal for FediBlockHole is to support pulling in a list of
blocklists from a set of trusted sources, merge them into a combined blocklist,
and then push that merged list to a set of managed instances.
Inspired by the way PiHole works for maintaining a set of blocklists of adtech
domains.
Mastodon admins can choose who they think maintain quality lists and subscribe
to them, helping to distribute the load for maintaining blocklists among a
community of people. Control ultimately rests with the admins themselves so they
can outsource as much, or as little, of the effort to others as they deem
appropriate.
## Features
### Blocklist Sources
- Read domain block lists from other instances via the Mastodon API.
- Supports both public lists (no auth required) and 'admin' lists requiring
authentication to an instance.
- Read domain block lists from arbitrary URLs, including local files.
- Supports CSV and JSON format blocklists
- Supports RapidBlock CSV and JSON format blocklists
### Blocklist Export/Push
- Push a merged blocklist to a set of Mastodon instances.
- Export per-source, unmerged block lists to local files, in CSV format.
- Export merged blocklists to local files, in CSV format.
- Read block lists from multiple remote instances
- Read block lists from multiple URLs, including local files
- Write a unified block list to a local CSV file
- Push unified blocklist updates to multiple remote instances
- Control import and export fields
### Flexible Configuration
- Provides (hopefully) sensible defaults to minimise first-time setup.
- Global and fine-grained configuration options available for those complex situations that crop up sometimes.
## Installing
Installs using `pip`.
Installable using `pip`.
Clone the repo and install from source like this:
```
python3 -m pip install fediblockhole
```
Install from source by cloning the repo, `cd fediblockhole` and run:
```
python3 -m pip install .
@ -22,11 +58,11 @@ python3 -m pip install .
Installation adds a commandline tool: `fediblock-sync`
Once things stablise a bit more, I'll upload the package to PyPI.
Instance admins who want to use this tool for their instance will need to add an
Application at `https://<instance-domain>/settings/applications/` so they can
authorize the tool to create and update domain blocks with an OAuth token.
Instance admins who want to use this tool will need to add an Application at
`https://<instance-domain>/settings/applications/` so they can authorize the
tool to create and update domain blocks with an OAuth token.
More on authorization by token below.
### Reading remote instance blocklists
@ -57,8 +93,8 @@ UPDATE oauth_access_tokens
WHERE token='<your_app_token>';
```
When that's done, FediBlockHole should be able to use its token to authorise
adding or updating domain blocks via the API.
When that's done, FediBlockHole should be able to use its token to read domain
blocks via the API.
### Writing instance blocklists
@ -81,6 +117,22 @@ UPDATE oauth_access_tokens
When that's done, FediBlockHole should be able to use its token to authorise
adding or updating domain blocks via the API.
## Using the tool
Run the tool like this:
```
fediblock-sync -c <configfile_path>
```
If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't
need to pass in the config file path.
For a list of possible configuration options, check the `--help`.
You can also read the heavily commented sample configuration file in the repo at
[etc/sample.fediblockhole.conf.toml](https://github.com/eigenmagic/fediblockhole/blob/main/etc/sample.fediblockhole.conf.toml).
## Configuring
Once you have your applications and tokens and scopes set up, create a
@ -93,17 +145,63 @@ As the filename suggests, FediBlockHole uses TOML syntax.
There are 3 key sections:
1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from
1. `blocklist_instance_sources`: A list of instances to read blocklists from via API
1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API
1. `blocklist_urls_sources`: A list of URLs to read blocklists from
1. `blocklist_instance_sources`: A list of Mastodon instances to read blocklists from via API
1. `blocklist_instance_destinations`: A list of Mastodon instances to write blocklists to via API
More detail on configuring the tool is provided below.
### URL sources
The URL sources is a list of URLs to fetch a CSV formatted blocklist from.
The URL sources is a list of URLs to fetch blocklists from.
The required fields are `domain` and `severity`.
Supported formats are currently:
Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`.
- Comma-Separated Values (CSV)
- JSON
- RapidBlock CSV
- RapidBlock JSON
Blocklists must provide a `domain` field, and should provide a `severity` field.
`domain` is the domain name of the instance to be blocked/limited.
`severity` is the severity level of the block/limit. Supported values are: `noop`, `silence`, and `suspend`.
Optional fields that the tool understands are `public_comment`, `private_comment`, `reject_media`, `reject_reports`, and `obfuscate`.
#### CSV format
A CSV format blocklist must contain a header row with at least a `domain` and `severity` field.
Optional fields, as listed about, may also be included.
#### JSON format
JSON is also supported. It uses the same format as the JSON returned from the Mastodon API.
This is a list of dictionaries, with at minimum a `domain` field, and preferably
a `severity` field. The other optional fields are, well, optional.
#### RapidBlock CSV format
The RapidBlock CSV format has no header and a single field, so it's not
_strictly_ a CSV file as there are no commas separating values. It is basically
just a list of domains to block, separated by '\r\n'.
When using this format, the tool assumes the `severity` level is `suspend`.
#### RapidBlock JSON format
The RapidBlock JSON format provides more detailed information about domain
blocks, but is still somewhat limited.
It has a single `isBlocked` flag indicating if a domain should be blocked or
not. There is no support for the 'silence' block level.
There is no support for 'reject_media' or 'reject_reports' or 'obfuscate'.
All comments are public, by virtue of the public nature of RapidBlock.
### Instance sources
@ -115,10 +213,10 @@ The configuration is a list of dictionaries of the form:
```
The `domain` is the fully-qualified domain name of the API host for an instance
you want to read or write domain blocks to/from.
you want to read domain blocks from.
The `token` is an optional OAuth token for the application that's configured in
the instance to allow you to read/write domain blocks, as discussed above.
the instance to allow you to read domain blocks, as discussed above.
`admin` is an optional field that tells the tool to use the more detailed admin
API endpoint for domain_blocks, rather than the more public API endpoint that
@ -133,42 +231,44 @@ Configure the list of instances you want to push your blocklist to in the
`blocklist_instance_detinations` list. Each entry is of the form:
```
{ domain = '<domain_name>', token = '<BearerToken>', max_followed_severity = 'silence' }
{ domain = '<domain_name>', token = '<BearerToken>', import_fields = ['public_comment'], max_severity = 'suspend', max_followed_severity = 'suspend' }
```
The fields `domain` and `token` are required. `max_followed_severity` is optional.
The fields `domain` and `token` are required.
The fields `max_followed_severity` and `import_fields` are optional.
The `domain` is the hostname of the instance you want to push to. The `token` is
an application token with both `admin:read:domain_blocks` and
`admin:write:domain_blocks` authorization.
The optional `import_fields` setting allows you to restrict which fields are
imported from each instance. If you want to import the `reject_reports` settings
from one instance, but no others, you can use the `import_fields` setting to do
it. **Note:** The `domain` and `severity` fields are always imported.
The optional `max_severity` setting limits the maximum severity you will allow a
remote blocklist to set. This helps you import a list from a remote instance but
only at the `silence` level, even if that remote instance has a block at
`suspend` level. If not set, defaults to `suspend`.
The optional `max_followed_severity` setting sets a per-instance limit on the
severity of a domain_block if there are accounts on the instance that follow
accounts on the domain to be blocked. If `max_followed_severity` isn't set, it
defaults to 'silence'.
defaults to `silence`.
This setting exists to give people time to move off an instance that is about to
be defederated and bring their followers from your instance with them. Without
it, if a new Suspend block appears in any of the blocklists you subscribe to (or
a block level increases from Silence to Suspend) and you're using the default
it, if a new `suspend` block appears in any of the blocklists you subscribe to (or
a block level increases from `silence` to `suspend`) and you're using the default
`max` mergeplan, the tool would immediately suspend the instance, cutting
everyone on the blocked instance off from their existing followers on your
instance, even if they move to a new instance. If you actually want that
outcome, you can set `max_followed_severity = 'suspend'` and use the `max`
mergeplan.
Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan).
## Using the tool
Once you've configured the tool, run it like this:
```
fediblock-sync -c <configfile_path>
```
If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path.
Once the follow count drops to 0 on your instance, the tool will automatically
use the highest severity it finds again (if you're using the `max` mergeplan).
## More advanced configuration

View File

@ -1,6 +1,6 @@
[project]
name = "fediblockhole"
version = "0.3.0"
version = "0.4.0"
description = "Federated blocklist management for Mastodon"
readme = "README.md"
license = {file = "LICENSE"}

View File

@ -29,7 +29,8 @@ URL_BLOCKLIST_MAXSIZE = 1024 ** 3
REQUEST_TIMEOUT = 30
# Time to wait between instance API calls to we don't melt them
API_CALL_DELAY = 3600 / 300 # 300 API calls per hour
# The default Mastodon rate limit is 300 calls per 5 minutes
API_CALL_DELAY = 5 * 60 / 300 # 300 calls per 5 minutes
# We always import the domain and the severity
IMPORT_FIELDS = ['domain', 'severity']
@ -416,13 +417,13 @@ def is_change_needed(oldblock: dict, newblock: dict, import_fields: list):
change_needed = oldblock.compare_fields(newblock, import_fields)
return change_needed
def update_known_block(token: str, host: str, blockdict: dict):
def update_known_block(token: str, host: str, block: DomainBlock):
"""Update an existing domain block with information in blockdict"""
api_path = "/api/v1/admin/domain_blocks/"
try:
id = blockdict['id']
blockdata = blockdict.copy()
id = block.id
blockdata = block._asdict()
del blockdata['id']
except KeyError:
import pdb

View File

@ -215,7 +215,7 @@ class DomainBlock(object):
"""Be iterable"""
keys = self.fields
if self.id:
if getattr(self, 'id', False):
keys.append('id')
for k in keys: