From ee9625d0753101c0a41fcc9bb5edc80d600476b7 Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Thu, 12 Jan 2023 10:41:01 +1100 Subject: [PATCH 1/6] Updated README documentation. --- README.md | 166 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 133 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index a11ac91..0820b06 100644 --- a/README.md +++ b/README.md @@ -2,19 +2,55 @@ A tool for keeping a Mastodon instance blocklist synchronised with remote lists. +The broad design goal for FediBlockHole is to support pulling in a list of +blocklists from a set of trusted sources, merge them into a combined blocklist, +and then push that merged list to a set of managed instances. + +Inspired by the way PiHole works for maintaining a set of blocklists of adtech +domains. + +Mastodon admins can choose who they think maintain quality lists and subscribe +to them, helping to distribute the load for maintaining blocklists among a +community of people. Control ultimately rests with the admins themselves so they +can outsource as much, or as little, of the effort to others as they deem +appropriate. + ## Features +### Blocklist Sources + + - Read domain block lists from other instances via the Mastodon API. + - Supports both public lists (no auth required) and 'admin' lists requiring + authentication to an instance. + - Read domain block lists from arbitrary URLs, including local files. + - Supports CSV and JSON format blocklists + - Supports RapidBlock CSV and JSON format blocklists + +### Blocklist Export/Push + + - Push a merged blocklist to a set of Mastodon instances. + - Export per-source, unmerged block lists to local files, in CSV format. + - Export merged blocklists to local files, in CSV format. - Read block lists from multiple remote instances - Read block lists from multiple URLs, including local files - Write a unified block list to a local CSV file - Push unified blocklist updates to multiple remote instances - Control import and export fields +### Flexible Configuration + + - Provides (hopefully) sensible defaults to minimise first-time setup. + - Global and fine-grained configuration options available for those complex situations that crop up sometimes. + ## Installing -Installs using `pip`. +Installable using `pip`. -Clone the repo and install from source like this: +``` +python3 -m pip install fediblockhole +``` + +Install from source by cloning the repo, `cd fediblockhole` and run: ``` python3 -m pip install . @@ -22,11 +58,11 @@ python3 -m pip install . Installation adds a commandline tool: `fediblock-sync` -Once things stablise a bit more, I'll upload the package to PyPI. +Instance admins who want to use this tool for their instance will need to add an +Application at `https:///settings/applications/` so they can +authorize the tool to create and update domain blocks with an OAuth token. -Instance admins who want to use this tool will need to add an Application at -`https:///settings/applications/` so they can authorize the -tool to create and update domain blocks with an OAuth token. +More on authorization by token below. ### Reading remote instance blocklists @@ -57,8 +93,8 @@ UPDATE oauth_access_tokens WHERE token=''; ``` -When that's done, FediBlockHole should be able to use its token to authorise -adding or updating domain blocks via the API. +When that's done, FediBlockHole should be able to use its token to read domain +blocks via the API. ### Writing instance blocklists @@ -81,6 +117,22 @@ UPDATE oauth_access_tokens When that's done, FediBlockHole should be able to use its token to authorise adding or updating domain blocks via the API. +## Using the tool + +Run the tool like this: + +``` +fediblock-sync -c +``` + +If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't +need to pass in the config file path. + +For a list of possible configuration options, check the `--help`. + +You can also read the heavily commented sample configuration file in the repo at +[etc/sample.fediblockhole.conf.toml](https://github.com/eigenmagic/fediblockhole/blob/main/etc/sample.fediblockhole.conf.toml). + ## Configuring Once you have your applications and tokens and scopes set up, create a @@ -93,17 +145,63 @@ As the filename suggests, FediBlockHole uses TOML syntax. There are 3 key sections: - 1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from - 1. `blocklist_instance_sources`: A list of instances to read blocklists from via API - 1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API + 1. `blocklist_urls_sources`: A list of URLs to read blocklists from + 1. `blocklist_instance_sources`: A list of Mastodon instances to read blocklists from via API + 1. `blocklist_instance_destinations`: A list of Mastodon instances to write blocklists to via API + +More detail on configuring the tool is provided below. ### URL sources -The URL sources is a list of URLs to fetch a CSV formatted blocklist from. +The URL sources is a list of URLs to fetch blocklists from. -The required fields are `domain` and `severity`. +Supported formats are currently: -Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`. + - Comma-Separated Values (CSV) + - JSON + - RapidBlock CSV + - RapidBlock JSON + +Blocklists must provide a `domain` field, and should provide a `severity` field. + +`domain` is the domain name of the instance to be blocked/limited. + +`severity` is the severity level of the block/limit. Supported values are: `noop`, `silence`, and `suspend`. + +Optional fields that the tool understands are `public_comment`, `private_comment`, `reject_media`, `reject_reports`, and `obfuscate`. + +#### CSV format + +A CSV format blocklist must contain a header row with at least a `domain` and `severity` field. + +Optional fields, as listed about, may also be included. + +#### JSON format + +JSON is also supported. It uses the same format as the JSON returned from the Mastodon API. + +This is a list of dictionaries, with at minimum a `domain` field, and preferably +a `severity` field. The other optional fields are, well, optional. + +#### RapidBlock CSV format + +The RapidBlock CSV format has no header and a single field, so it's not +_strictly_ a CSV file as there are no commas separating values. It is basically +just a list of domains to block, separated by '\r\n'. + +When using this format, the tool assumes the `severity` level is `suspend`. + +#### RapidBlock JSON format + +The RapidBlock JSON format provides more detailed information about domain +blocks, but is still somewhat limited. + +It has a single `isBlocked` flag indicating if a domain should be blocked or +not. There is no support for the 'silence' block level. + +There is no support for 'reject_media' or 'reject_reports' or 'obfuscate'. + +All comments are public, by virtue of the public nature of RapidBlock. ### Instance sources @@ -115,10 +213,10 @@ The configuration is a list of dictionaries of the form: ``` The `domain` is the fully-qualified domain name of the API host for an instance -you want to read or write domain blocks to/from. +you want to read domain blocks from. The `token` is an optional OAuth token for the application that's configured in -the instance to allow you to read/write domain blocks, as discussed above. +the instance to allow you to read domain blocks, as discussed above. `admin` is an optional field that tells the tool to use the more detailed admin API endpoint for domain_blocks, rather than the more public API endpoint that @@ -133,42 +231,44 @@ Configure the list of instances you want to push your blocklist to in the `blocklist_instance_detinations` list. Each entry is of the form: ``` -{ domain = '', token = '', max_followed_severity = 'silence' } +{ domain = '', token = '', import_fields = ['public_comment'], max_severity = 'suspend', max_followed_severity = 'suspend' } ``` -The fields `domain` and `token` are required. `max_followed_severity` is optional. +The fields `domain` and `token` are required. + +The fields `max_followed_severity` and `import_fields` are optional. The `domain` is the hostname of the instance you want to push to. The `token` is an application token with both `admin:read:domain_blocks` and `admin:write:domain_blocks` authorization. +The optional `import_fields` setting allows you to restrict which fields are +imported from each instance. If you want to import the `reject_reports` settings +from one instance, but no others, you can use the `import_fields` setting to do +it. **Note:** The `domain` and `severity` fields are always imported. + +The optional `max_severity` setting limits the maximum severity you will allow a +remote blocklist to set. This helps you import a list from a remote instance but +only at the `silence` level, even if that remote instance has a block at +`suspend` level. If not set, defaults to `suspend`. + The optional `max_followed_severity` setting sets a per-instance limit on the severity of a domain_block if there are accounts on the instance that follow accounts on the domain to be blocked. If `max_followed_severity` isn't set, it -defaults to 'silence'. +defaults to `silence`. This setting exists to give people time to move off an instance that is about to be defederated and bring their followers from your instance with them. Without -it, if a new Suspend block appears in any of the blocklists you subscribe to (or -a block level increases from Silence to Suspend) and you're using the default +it, if a new `suspend` block appears in any of the blocklists you subscribe to (or +a block level increases from `silence` to `suspend`) and you're using the default `max` mergeplan, the tool would immediately suspend the instance, cutting everyone on the blocked instance off from their existing followers on your instance, even if they move to a new instance. If you actually want that outcome, you can set `max_followed_severity = 'suspend'` and use the `max` mergeplan. -Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan). - - -## Using the tool - -Once you've configured the tool, run it like this: - -``` -fediblock-sync -c -``` - -If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path. +Once the follow count drops to 0 on your instance, the tool will automatically +use the highest severity it finds again (if you're using the `max` mergeplan). ## More advanced configuration From 9fb575bb2f07d5417ef1c4f48eadd69de46d36c8 Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Thu, 12 Jan 2023 15:49:41 +1100 Subject: [PATCH 2/6] Starting to prep for v0.4.0 release. --- CHANGELOG.md | 35 +++++++++++++++++++++++++++++++++++ pyproject.toml | 2 +- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 08c5369..a54a934 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,41 @@ This project uses [Semantic Versioning] and generally follows the conventions of Important planned changes not yet bundled up will be listed here. +## [0.4.0] - 2023-01-12 + +Substantial changes to better support multiple blocklist formats + +### Added + +- Added support for RapidBlock blocklists, both CSV and JSON formats. (327a44d) +- Added support for per-instance-source import_fields. (327a44d) +- Updated sample config to include new formats. (327a44d) +- A BlockSeverity of 'suspend' implies reject_media and reject_reports. (327a44d) +- Added ability to limit max severity per-URL source. (10011a5) +- Added boolean fields like 'reject_reports' to mergeplan handling. (66f0373) +- Added tests for boolean merge situations. (66f0373) +- Various other test cases added. + +### Changed + +- Refactored to add a DomainBlock object. (10011a5) +- Refactored to use a BlockParser structure. (10011a5) +- Improved method for checking if changes are needed. (10011a5) +- Refactored fetch from URLs and instances. (327a44d) +- Improved check_followed_severity() behaviour. (327a44d) +- Changed API delay to be in calls per hour. (327a44d) +- Improved comment merging. (0a6eec4) +- Clarified logic in apply_mergeplan() for boolean fields. (66f0373) +- Updated README documentation. (ee9625d) + +### Removed + +- Removed redundant global vars. (327a44d) + +### Fixed + +- Fixed bug in severity change detection. (e0d40b5) + ## [0.3.0] - 2023-01-11 ### Added diff --git a/pyproject.toml b/pyproject.toml index b89e804..24f9aff 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "fediblockhole" -version = "0.3.0" +version = "0.4.0" description = "Federated blocklist management for Mastodon" readme = "README.md" license = {file = "LICENSE"} From 55dad3fa32e52be4868afd46c7f96a78cb5f9737 Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Fri, 13 Jan 2023 17:12:23 +1100 Subject: [PATCH 3/6] Aligned API call rate limit with server default. --- src/fediblockhole/__init__.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/fediblockhole/__init__.py b/src/fediblockhole/__init__.py index a13f27c..9977dc9 100755 --- a/src/fediblockhole/__init__.py +++ b/src/fediblockhole/__init__.py @@ -29,7 +29,8 @@ URL_BLOCKLIST_MAXSIZE = 1024 ** 3 REQUEST_TIMEOUT = 30 # Time to wait between instance API calls to we don't melt them -API_CALL_DELAY = 3600 / 300 # 300 API calls per hour +# The default Mastodon rate limit is 300 calls per 5 minutes +API_CALL_DELAY = 5 * 60 / 300 # 300 calls per 5 minutes # We always import the domain and the severity IMPORT_FIELDS = ['domain', 'severity'] From a718af5a0bbd9249c5bbb3485c11a6c7a20449ae Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Fri, 13 Jan 2023 17:30:41 +1100 Subject: [PATCH 4/6] Fix DomainBlock.id usage during __iter__() --- src/fediblockhole/const.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/fediblockhole/const.py b/src/fediblockhole/const.py index 4fbdc43..909d84d 100644 --- a/src/fediblockhole/const.py +++ b/src/fediblockhole/const.py @@ -215,7 +215,7 @@ class DomainBlock(object): """Be iterable""" keys = self.fields - if self.id: + if getattr(self, 'id', False): keys.append('id') for k in keys: From 69c28f1a3ff2dcbb351f5a6740ab46b4d324ee22 Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Fri, 13 Jan 2023 17:31:50 +1100 Subject: [PATCH 5/6] add DomainBlock type hint to update_known_block(). Use ._asdict() to get info to pass to add block API call. --- src/fediblockhole/__init__.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/fediblockhole/__init__.py b/src/fediblockhole/__init__.py index 9977dc9..6cdf143 100755 --- a/src/fediblockhole/__init__.py +++ b/src/fediblockhole/__init__.py @@ -417,13 +417,13 @@ def is_change_needed(oldblock: dict, newblock: dict, import_fields: list): change_needed = oldblock.compare_fields(newblock, import_fields) return change_needed -def update_known_block(token: str, host: str, blockdict: dict): +def update_known_block(token: str, host: str, block: DomainBlock): """Update an existing domain block with information in blockdict""" api_path = "/api/v1/admin/domain_blocks/" try: - id = blockdict['id'] - blockdata = blockdict.copy() + id = block.id + blockdata = block._asdict() del blockdata['id'] except KeyError: import pdb From 132367867350afbf9bc1534eac2a6ffe6bb4116f Mon Sep 17 00:00:00 2001 From: Justin Warren Date: Fri, 13 Jan 2023 19:14:40 +1100 Subject: [PATCH 6/6] Add last few changes to changelog for release. --- CHANGELOG.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a54a934..3d15d08 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ This project uses [Semantic Versioning] and generally follows the conventions of Important planned changes not yet bundled up will be listed here. -## [0.4.0] - 2023-01-12 +## [0.4.0] - 2023-01-13 Substantial changes to better support multiple blocklist formats @@ -34,6 +34,7 @@ Substantial changes to better support multiple blocklist formats - Improved comment merging. (0a6eec4) - Clarified logic in apply_mergeplan() for boolean fields. (66f0373) - Updated README documentation. (ee9625d) +- Aligned API call rate limit with server default. (55dad3f) ### Removed @@ -42,6 +43,8 @@ Substantial changes to better support multiple blocklist formats ### Fixed - Fixed bug in severity change detection. (e0d40b5) +- Fix DomainBlock.id usage during __iter__() (a718af5) +- ## [0.3.0] - 2023-01-11