diff --git a/README.md b/README.md index cbd9bd7..bc45298 100644 --- a/README.md +++ b/README.md @@ -13,17 +13,28 @@ A tool for keeping a Mastodon instance blocklist synchronised with remote lists. ## Installing Instance admins who want to use this tool will need to add an Application at -`https:///settings/applications/` they can authorise with an -OAuth token. For each instance you connect to, add this token to the config file. +`https:///settings/applications/` so they can authorize the +tool to create and update domain blocks with an OAuth token. ### Reading remote instance blocklists -To read admin blocks from a remote instance, you'll need to ask the instance admin to add a new Application at `https:///settings/applications/` and then tell you the access token. +If a remote instance makes its domain blocks public, you don't need +a token to read them. -The application needs the `admin:read:domain_blocks` OAuth scope, but unfortunately this -scope isn't available in the current application screen (v4.0.2 of Mastodon at -time of writing). There is a way to do it with scopes, but it's really -dangerous, so I'm not going to tell you what it is here. +If a remote instance only shows its domain blocks to local accounts +you'll need to have a token with `read:blocks` authorization set up. +If you have an account on that instance, you can get a token by setting up a new +Application at `https:///settings/applications/`. + +To read admin blocks from a remote instance, you'll need to ask the instance +admin to add a new Application at +`https:///settings/applications/` and then tell you the access +token. + +The application needs the `admin:read:domain_blocks` OAuth scope, but +unfortunately this scope isn't available in the current application screen +(v4.0.2 of Mastodon at time of writing). There is a way to do it with scopes, +but it's really dangerous, so I'm not going to tell you what it is here. A better way is to ask the instance admin to connect to the PostgreSQL database and add the scope there, like this: @@ -68,20 +79,74 @@ Or you can use the default location of `/etc/default/fediblockhole.conf.toml`. As the filename suggests, FediBlockHole uses TOML syntax. -There are 2 key sections: +There are 3 key sections: + + 1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from + 1. `blocklist_instance_sources`: A list of instances to read blocklists from via API + 1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API - 1. `blocklist_instance_sources`: A list of instances to read blocklists from - 1. `blocklist_instance_destinations`: A list of instances to write blocklists to +### URL sources -Each is a list of dictionaries of the form: +The URL sources is a list of URLs to fetch a CSV formatted blocklist from. + +The required fields are `domain` and `severity`. + +Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`. + +### Instance sources + +The tool can also read domain_blocks from instances directly. + +The configuration is a list of dictionaries of the form: ``` -{ domain = '', token = '' } +{ domain = '', token = '', admin = false } ``` The `domain` is the fully-qualified domain name of the API host for an instance -you want to read or write domain blocks to/from. The `BearerToken` is the OAuth -token for the application that's configured in the instance to allow you to -read/write domain blocks, as discussed above. +you want to read or write domain blocks to/from. + +The `token` is an optional OAuth token for the application that's configured in +the instance to allow you to read/write domain blocks, as discussed above. + +`admin` is an optional field that tells the tool to use the more detailed admin +API endpoint for domain_blocks, rather than the more public API endpoint that +doesn't provide as much detail. You will need a `token` that's been configured to +permit access to the admin domain_blocks scope, as detailed above. + +### Instance destinations + +The tool supports pushing a unified blocklist to multiple instances. + +Configure the list of instances you want to push your blocklist to in the +`blocklist_instance_detinations` list. Each entry is of the form: + +``` +{ domain = '', token = '', max_followed_severity = 'silence' } +``` + +The fields `domain` and `token` are required. `max_followed_severity` is optional. + +The `domain` is the hostname of the instance you want to push to. The `token` is +an application token with both `admin:read:domain_blocks` and +`admin:write:domain_blocks` authorization. + +The optional `max_followed_severity` setting sets a per-instance limit on the +severity of a domain_block if there are accounts on the instance that follow +accounts on the domain to be blocked. If `max_followed_severity` isn't set, it +defaults to 'silence'. + +This setting exists to give people time to move off an instance that is about to +be defederated and bring their followers from your instance with them. Without +it, if a new Suspend block appears in any of the blocklists you subscribe to (or +a block level increases from Silence to Suspend) and you're using the default +`max` mergeplan, the tool would immediately suspend the instance, cutting +everyone on the blocked instance off from their existing followers on your +instance, even if they move to a new instance. If you actually want that +outcome, you can set `max_followed_severity = 'suspend'` and use the `max` +mergeplan. + +Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan). + ## Using the tool @@ -91,14 +156,14 @@ Once you've configured the tool, run it like this: fediblock_sync.py -c ``` -If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass the config file path. +If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path. ## More advanced configuration For a list of possible configuration options, check the `--help` and read the sample configuration file in `etc/sample.fediblockhole.conf.toml`. -### keep_intermediate +### save_intermediate This option tells the tool to save the unmerged blocklists it fetches from remote instances and URLs into separate files. This is handy for debugging, or diff --git a/bin/fediblock_sync.py b/bin/fediblock_sync.py index e17efc9..5d222d0 100755 --- a/bin/fediblock_sync.py +++ b/bin/fediblock_sync.py @@ -108,7 +108,8 @@ def sync_blocklists(conf: dict): for dest in conf.blocklist_instance_destinations: domain = dest['domain'] token = dest['token'] - push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields) + max_followed_severity = dest.get('max_followed_severity', 'silence') + push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields, max_followed_severity) def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict: """Merge fetched remote blocklists into a bulk update @@ -125,7 +126,7 @@ def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict: domain = newblock['domain'] # If the domain has two asterisks in it, it's obfuscated # and we can't really use it, so skip it and do the next one - if '**' in domain: + if '*' in domain: log.debug(f"Domain '{domain}' is obfuscated. Skipping it.") continue @@ -177,7 +178,7 @@ def apply_mergeplan(oldblock: dict, newblock: dict, mergeplan: str='max') -> dic blockdata['severity'] = newblock['severity'] # If obfuscate is set and is True for the domain in - # any blocklist then obfuscate is set to false. + # any blocklist then obfuscate is set to True. if newblock.get('obfuscate', False): blockdata['obfuscate'] = True @@ -253,7 +254,7 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False, url = urlstring.strip('<').rstrip('>') log.debug(f"Found {len(domain_blocks)} existing domain blocks.") - # Remove fields not in import list + # Remove fields not in import list. for row in domain_blocks: origrow = row.copy() for key in origrow: @@ -274,18 +275,98 @@ def delete_block(token: str, host: str, id: int): ) if response.status_code != 200: if response.status_code == 404: - log.warn(f"No such domain block: {id}") + log.warning(f"No such domain block: {id}") return raise ValueError(f"Something went wrong: {response.status_code}: {response.content}") +def fetch_instance_follows(token: str, host: str, domain: str) -> int: + """Fetch the followers of the target domain at the instance + + @param token: the Bearer authentication token for OAuth access + @param host: the instance API hostname/IP address + @param domain: the domain to search for followers of + @returns: int, number of local followers of remote instance accounts + """ + api_path = "/api/v1/admin/measures" + url = f"https://{host}{api_path}" + + key = 'instance_follows' + + # This data structure only allows us to request a single domain + # at a time, which limits the load on the remote instance of each call + data = { + 'keys': [ + key + ], + key: { 'domain': domain }, + } + + # The Mastodon API only accepts JSON formatted POST data for measures + response = requests.post(url, + headers={ + 'Authorization': f"Bearer {token}", + }, + json=data, + ) + if response.status_code != 200: + if response.status_code == 403: + log.error(f"Cannot fetch follow information for {domain} from {host}: {response.content}") + + raise ValueError(f"Something went wrong: {response.status_code}: {response.content}") + + # Get the total returned + follows = int(response.json()[0]['total']) + return follows + +def check_followed_severity(host: str, token: str, domain: str, + severity: str, max_followed_severity: str='silence'): + """Check an instance to see if it has followers of a to-be-blocked instance""" + + # If the instance has accounts that follow people on the to-be-blocked domain, + # limit the maximum severity to the configured `max_followed_severity`. + follows = fetch_instance_follows(token, host, domain) + if follows > 0: + log.debug(f"Instance {host} has {follows} followers of accounts at {domain}.") + if SEVERITY[severity] > SEVERITY[max_followed_severity]: + log.warning(f"Instance {host} has {follows} followers of accounts at {domain}. Limiting block severity to {max_followed_severity}.") + return max_followed_severity + else: + return severity + +def is_change_needed(oldblock: dict, newblock: dict, import_fields: list): + """Compare block definitions to see if changes are needed""" + # Check if anything is actually different and needs updating + change_needed = [] + + for key in import_fields: + try: + oldval = oldblock[key] + newval = newblock[key] + log.debug(f"Compare {key} '{oldval}' <> '{newval}'") + + if oldval != newval: + log.debug("Difference detected. Change needed.") + change_needed.append(key) + break + + except KeyError: + log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...") + continue + + return change_needed + def update_known_block(token: str, host: str, blockdict: dict): """Update an existing domain block with information in blockdict""" api_path = "/api/v1/admin/domain_blocks/" - id = blockdict['id'] - blockdata = blockdict.copy() - del blockdata['id'] + try: + id = blockdict['id'] + blockdata = blockdict.copy() + del blockdata['id'] + except KeyError: + import pdb + pdb.set_trace() url = f"https://{host}{api_path}{id}" @@ -308,12 +389,20 @@ def add_block(token: str, host: str, blockdata: dict): headers={'Authorization': f"Bearer {token}"}, data=blockdata ) - if response.status_code != 200: - raise ValueError(f"Something went wrong: {response.status_code}: {response.content}") + if response.status_code == 422: + # A stricter block already exists. Probably for the base domain. + err = json.loads(response.content) + log.warning(err['error']) + elif response.status_code != 200: + + raise ValueError(f"Something went wrong: {response.status_code}: {response.content}") + def push_blocklist(token: str, host: str, blocklist: list[dict], dryrun: bool=False, - import_fields: list=['domain', 'severity']): + import_fields: list=['domain', 'severity'], + max_followed_severity='silence', + ): """Push a blocklist to a remote instance. Merging the blocklist with the existing list the instance has, @@ -326,47 +415,42 @@ def push_blocklist(token: str, host: str, blocklist: list[dict], """ log.info(f"Pushing blocklist to host {host} ...") # Fetch the existing blocklist from the instance - # Force use of the admin API + # Force use of the admin API, and add 'id' to the list of fields + if 'id' not in import_fields: + import_fields.append('id') serverblocks = fetch_instance_blocklist(host, token, True, import_fields) - # Convert serverblocks to a dictionary keyed by domain name + # # Convert serverblocks to a dictionary keyed by domain name knownblocks = {row['domain']: row for row in serverblocks} for newblock in blocklist: - log.debug(f"applying newblock: {newblock}") + log.debug(f"Applying newblock: {newblock}") oldblock = knownblocks.get(newblock['domain'], None) if oldblock: log.debug(f"Block already exists for {newblock['domain']}, checking for differences...") - # Check if anything is actually different and needs updating - change_needed = False - - for key in import_fields: - try: - oldval = oldblock[key] - newval = newblock[key] - log.debug(f"Compare {key} '{oldval}' <> '{newval}'") - - if oldval != newval: - log.debug("Difference detected. Change needed.") - change_needed = True - break - - except KeyError: - log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...") - continue + change_needed = is_change_needed(oldblock, newblock, import_fields) if change_needed: - log.info(f"Change detected. Updating domain block for {oldblock['domain']}") - blockdata = oldblock.copy() - blockdata.update(newblock) - if not dryrun: - update_known_block(token, host, blockdata) - # add a pause here so we don't melt the instance - time.sleep(1) - else: - log.info("Dry run selected. Not applying changes.") + # Change might be needed, but let's see if the severity + # needs to change. If not, maybe no changes are needed? + newseverity = check_followed_severity(host, token, oldblock['domain'], newblock['severity'], max_followed_severity) + if newseverity != oldblock['severity']: + newblock['severity'] = newseverity + change_needed.append('severity') + + # Change still needed? + if change_needed: + log.info(f"Change detected. Updating domain block for {oldblock['domain']}") + blockdata = oldblock.copy() + blockdata.update(newblock) + if not dryrun: + update_known_block(token, host, blockdata) + # add a pause here so we don't melt the instance + time.sleep(1) + else: + log.info("Dry run selected. Not applying changes.") else: log.debug("No differences detected. Not updating.") @@ -385,6 +469,9 @@ def push_blocklist(token: str, host: str, blocklist: list[dict], 'reject_reports': newblock.get('reject_reports', False), 'obfuscate': newblock.get('obfuscate', False), } + + # Make sure the new block doesn't clobber a domain with followers + blockdata['severity'] = check_followed_severity(host, token, newblock['domain'], max_followed_severity) log.info(f"Adding new block for {blockdata['domain']}...") if not dryrun: add_block(token, host, blockdata) @@ -514,4 +601,4 @@ if __name__ == '__main__': args = augment_args(args) # Do the work of syncing - sync_blocklists(args) \ No newline at end of file + sync_blocklists(args) diff --git a/etc/sample.fediblockhole.conf.toml b/etc/sample.fediblockhole.conf.toml index 0a3de7a..d39c999 100644 --- a/etc/sample.fediblockhole.conf.toml +++ b/etc/sample.fediblockhole.conf.toml @@ -17,11 +17,11 @@ blocklist_url_sources = [ # List of instances to write blocklist to blocklist_instance_destinations = [ - # { domain = 'eigenmagic.net', token = '' }, + # { domain = 'eigenmagic.net', token = '', max_followed_severity = 'silence'}, ] ## Store a local copy of the remote blocklists after we fetch them -#keep_intermediate = true +#save_intermediate = true ## Directory to store the local blocklist copies # savedir = '/tmp'