Add ability to set max severity level if

an instance has followers of accounts on a to-be-blocked domain. Refactored the change detection code. Fixed a bug in config of intermediate blocklists saving. Updated README documentation. Updated sample config. Addresses #5
2023-01-09 16:51:30 +11:00 · 2023-01-09 16:51:30 +11:00 · 55184210d4
parent a134870f14
commit 55184210d4
3 changed files with 212 additions and 60 deletions
--- a/README.md
+++ b/README.md
@ -13,17 +13,28 @@ A tool for keeping a Mastodon instance blocklist synchronised with remote lists.
 ## Installing

 Instance admins who want to use this tool will need to add an Application at
-`https://<instance-domain>/settings/applications/` they can authorise with an
-OAuth token. For each instance you connect to, add this token to the config file.
+`https://<instance-domain>/settings/applications/` so they can authorize the
+tool to create and update domain blocks with an OAuth token. 

 ### Reading remote instance blocklists

-To read admin blocks from a remote instance, you'll need to ask the instance admin to add a new Application at `https://<instance-domain>/settings/applications/` and then tell you the access token.
+If a remote instance makes its domain blocks public, you don't need
+a token to read them.

-The application needs the `admin:read:domain_blocks` OAuth scope, but unfortunately this
-scope isn't available in the current application screen (v4.0.2 of Mastodon at
-time of writing). There is a way to do it with scopes, but it's really
-dangerous, so I'm not going to tell you what it is here.
+If a remote instance only shows its domain blocks to local accounts
+you'll need to have a token with `read:blocks` authorization set up.
+If you have an account on that instance, you can get a token by setting up a new
+Application at `https://<instance-domain>/settings/applications/`.
+
+To read admin blocks from a remote instance, you'll need to ask the instance
+admin to add a new Application at
+`https://<instance-domain>/settings/applications/` and then tell you the access
+token.
+
+The application needs the `admin:read:domain_blocks` OAuth scope, but
+unfortunately this scope isn't available in the current application screen
+(v4.0.2 of Mastodon at time of writing). There is a way to do it with scopes,
+but it's really dangerous, so I'm not going to tell you what it is here.

 A better way is to ask the instance admin to connect to the PostgreSQL database
 and add the scope there, like this:
@ -68,20 +79,74 @@ Or you can use the default location of `/etc/default/fediblockhole.conf.toml`.

 As the filename suggests, FediBlockHole uses TOML syntax.

-There are 2 key sections:
+There are 3 key sections:
 
- 1. `blocklist_instance_sources`: A list of instances to read blocklists from
- 1. `blocklist_instance_destinations`: A list of instances to write blocklists to
+ 1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from
+ 1. `blocklist_instance_sources`: A list of instances to read blocklists from via API
+ 1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API

-Each is a list of dictionaries of the form:
+### URL sources
+
+The URL sources is a list of URLs to fetch a CSV formatted blocklist from.
+
+The required fields are `domain` and `severity`.
+
+Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`.
+
+### Instance sources
+
+The tool can also read domain_blocks from instances directly.
+
+The configuration is a list of dictionaries of the form:
 ```
-{ domain = '<domain_name>', token = '<BearerToken>' }
+{ domain = '<domain_name>', token = '<BearerToken>', admin = false }
 ```

 The `domain` is the fully-qualified domain name of the API host for an instance
-you want to read or write domain blocks to/from. The `BearerToken` is the OAuth
-token for the application that's configured in the instance to allow you to
-read/write domain blocks, as discussed above.
+you want to read or write domain blocks to/from. 
+
+The `token` is an optional OAuth token for the application that's configured in
+the instance to allow you to read/write domain blocks, as discussed above.
+
+`admin` is an optional field that tells the tool to use the more detailed admin
+API endpoint for domain_blocks, rather than the more public API endpoint that
+doesn't provide as much detail. You will need a `token` that's been configured to
+permit access to the admin domain_blocks scope, as detailed above.
+
+### Instance destinations
+
+The tool supports pushing a unified blocklist to multiple instances.
+
+Configure the list of instances you want to push your blocklist to in the
+`blocklist_instance_detinations` list. Each entry is of the form:
+
+```
+{ domain = '<domain_name>', token = '<BearerToken>', max_followed_severity = 'silence' }
+```
+
+The fields `domain` and `token` are required. `max_followed_severity` is optional.
+
+The `domain` is the hostname of the instance you want to push to. The `token` is
+an application token with both `admin:read:domain_blocks` and
+`admin:write:domain_blocks` authorization.
+
+The optional `max_followed_severity` setting sets a per-instance limit on the
+severity of a domain_block if there are accounts on the instance that follow
+accounts on the domain to be blocked. If `max_followed_severity` isn't set, it
+defaults to 'silence'.
+
+This setting exists to give people time to move off an instance that is about to
+be defederated and bring their followers from your instance with them. Without
+it, if a new Suspend block appears in any of the blocklists you subscribe to (or
+a block level increases from Silence to Suspend) and you're using the default
+`max` mergeplan, the tool would immediately suspend the instance, cutting
+everyone on the blocked instance off from their existing followers on your
+instance, even if they move to a new instance. If you actually want that
+outcome, you can set `max_followed_severity = 'suspend'` and use the `max`
+mergeplan.
+
+Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan).
+

 ## Using the tool

@ -91,14 +156,14 @@ Once you've configured the tool, run it like this:
 fediblock_sync.py -c <configfile_path>
 ```

-If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass the config file path.
+If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path.

 ## More advanced configuration

 For a list of possible configuration options, check the `--help` and read the
 sample configuration file in `etc/sample.fediblockhole.conf.toml`.

-### keep_intermediate
+### save_intermediate

 This option tells the tool to save the unmerged blocklists it fetches from
 remote instances and URLs into separate files. This is handy for debugging, or
--- a/bin/fediblock_sync.py
+++ b/bin/fediblock_sync.py
@ -108,7 +108,8 @@ def sync_blocklists(conf: dict):
        for dest in conf.blocklist_instance_destinations:
            domain = dest['domain']
            token = dest['token']
-            push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields)
+            max_followed_severity = dest.get('max_followed_severity', 'silence')
+            push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields, max_followed_severity)

 def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
    """Merge fetched remote blocklists into a bulk update
@ -125,7 +126,7 @@ def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
            domain = newblock['domain']
            # If the domain has two asterisks in it, it's obfuscated
            # and we can't really use it, so skip it and do the next one
-            if '**' in domain:
+            if '*' in domain:
                log.debug(f"Domain '{domain}' is obfuscated. Skipping it.")
                continue

@ -177,7 +178,7 @@ def apply_mergeplan(oldblock: dict, newblock: dict, mergeplan: str='max') -> dic
            blockdata['severity'] = newblock['severity']
        
        # If obfuscate is set and is True for the domain in
-        # any blocklist then obfuscate is set to false.
+        # any blocklist then obfuscate is set to True.
        if newblock.get('obfuscate', False):
            blockdata['obfuscate'] = True

@ -253,7 +254,7 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False,
            url = urlstring.strip('<').rstrip('>')

    log.debug(f"Found {len(domain_blocks)} existing domain blocks.")
-    # Remove fields not in import list
+    # Remove fields not in import list.
    for row in domain_blocks:
        origrow = row.copy()
        for key in origrow:
@ -274,18 +275,98 @@ def delete_block(token: str, host: str, id: int):
    )
    if response.status_code != 200:
        if response.status_code == 404:
-            log.warn(f"No such domain block: {id}")
+            log.warning(f"No such domain block: {id}")
            return

        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")

+def fetch_instance_follows(token: str, host: str, domain: str) -> int:
+    """Fetch the followers of the target domain at the instance
+
+    @param token: the Bearer authentication token for OAuth access
+    @param host: the instance API hostname/IP address
+    @param domain: the domain to search for followers of
+    @returns: int, number of local followers of remote instance accounts
+    """
+    api_path = "/api/v1/admin/measures"
+    url = f"https://{host}{api_path}"
+
+    key = 'instance_follows'
+
+    # This data structure only allows us to request a single domain
+    # at a time, which limits the load on the remote instance of each call
+    data = {
+        'keys': [
+            key
+            ],
+        key: { 'domain': domain },
+    }
+
+    # The Mastodon API only accepts JSON formatted POST data for measures
+    response = requests.post(url,
+        headers={
+            'Authorization': f"Bearer {token}",
+        },
+        json=data,
+    )
+    if response.status_code != 200:
+        if response.status_code == 403:
+            log.error(f"Cannot fetch follow information for {domain} from {host}: {response.content}")
+
+        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
+
+    # Get the total returned
+    follows = int(response.json()[0]['total'])
+    return follows
+
+def check_followed_severity(host: str, token: str, domain: str,
+    severity: str, max_followed_severity: str='silence'):
+    """Check an instance to see if it has followers of a to-be-blocked instance"""
+
+    # If the instance has accounts that follow people on the to-be-blocked domain,
+    # limit the maximum severity to the configured `max_followed_severity`.
+    follows = fetch_instance_follows(token, host, domain)
+    if follows > 0:
+        log.debug(f"Instance {host} has {follows} followers of accounts at {domain}.")
+        if SEVERITY[severity] > SEVERITY[max_followed_severity]:
+            log.warning(f"Instance {host} has {follows} followers of accounts at {domain}. Limiting block severity to {max_followed_severity}.")
+            return max_followed_severity
+        else:
+            return severity
+
+def is_change_needed(oldblock: dict, newblock: dict, import_fields: list):
+    """Compare block definitions to see if changes are needed"""
+    # Check if anything is actually different and needs updating
+    change_needed = []
+
+    for key in import_fields:
+        try:
+            oldval = oldblock[key]
+            newval = newblock[key]
+            log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
+
+            if oldval != newval:
+                log.debug("Difference detected. Change needed.")
+                change_needed.append(key)
+                break
+
+        except KeyError:
+            log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
+            continue
+    
+    return change_needed
+
 def update_known_block(token: str, host: str, blockdict: dict):
    """Update an existing domain block with information in blockdict"""
    api_path = "/api/v1/admin/domain_blocks/"

-    id = blockdict['id']
-    blockdata = blockdict.copy()
-    del blockdata['id']
+    try:
+        id = blockdict['id']
+        blockdata = blockdict.copy()
+        del blockdata['id']
+    except KeyError:
+        import pdb
+        pdb.set_trace()

    url = f"https://{host}{api_path}{id}"

@ -308,12 +389,20 @@ def add_block(token: str, host: str, blockdata: dict):
        headers={'Authorization': f"Bearer {token}"},
        data=blockdata
    )
-    if response.status_code != 200:
+    if response.status_code == 422:
+        # A stricter block already exists. Probably for the base domain.
+        err = json.loads(response.content)
+        log.warning(err['error'])
+
+    elif response.status_code != 200:
+            
        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
           
 def push_blocklist(token: str, host: str, blocklist: list[dict],
                    dryrun: bool=False,
-                    import_fields: list=['domain', 'severity']):
+                    import_fields: list=['domain', 'severity'],
+                    max_followed_severity='silence',
+                    ):
    """Push a blocklist to a remote instance.
    
    Merging the blocklist with the existing list the instance has,
@ -326,47 +415,42 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
    """
    log.info(f"Pushing blocklist to host {host} ...")
    # Fetch the existing blocklist from the instance
-    # Force use of the admin API
+    # Force use of the admin API, and add 'id' to the list of fields
+    if 'id' not in import_fields:
+        import_fields.append('id')
    serverblocks = fetch_instance_blocklist(host, token, True, import_fields)

-    # Convert serverblocks to a dictionary keyed by domain name
+    # # Convert serverblocks to a dictionary keyed by domain name
    knownblocks = {row['domain']: row for row in serverblocks}

    for newblock in blocklist:

-        log.debug(f"applying newblock: {newblock}")
+        log.debug(f"Applying newblock: {newblock}")
        oldblock = knownblocks.get(newblock['domain'], None)
        if oldblock:
            log.debug(f"Block already exists for {newblock['domain']}, checking for differences...")

-            # Check if anything is actually different and needs updating
-            change_needed = False
-
-            for key in import_fields:
-                try:
-                    oldval = oldblock[key]
-                    newval = newblock[key]
-                    log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
-
-                    if oldval != newval:
-                        log.debug("Difference detected. Change needed.")
-                        change_needed = True
-                        break
-
-                except KeyError:
-                    log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
-                    continue
+            change_needed = is_change_needed(oldblock, newblock, import_fields)
            
            if change_needed:
-                log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
-                blockdata = oldblock.copy()
-                blockdata.update(newblock)
-                if not dryrun:
-                    update_known_block(token, host, blockdata)
-                    # add a pause here so we don't melt the instance
-                    time.sleep(1)
-                else:
-                    log.info("Dry run selected. Not applying changes.")
+                # Change might be needed, but let's see if the severity
+                # needs to change. If not, maybe no changes are needed?
+                newseverity = check_followed_severity(host, token, oldblock['domain'], newblock['severity'], max_followed_severity)
+                if newseverity != oldblock['severity']:
+                    newblock['severity'] = newseverity
+                    change_needed.append('severity')
+
+                # Change still needed?
+                if change_needed:
+                    log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
+                    blockdata = oldblock.copy()
+                    blockdata.update(newblock)
+                    if not dryrun:
+                        update_known_block(token, host, blockdata)
+                        # add a pause here so we don't melt the instance
+                        time.sleep(1)
+                    else:
+                        log.info("Dry run selected. Not applying changes.")

            else:
                log.debug("No differences detected. Not updating.")
@ -385,6 +469,9 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
                'reject_reports': newblock.get('reject_reports', False),
                'obfuscate': newblock.get('obfuscate', False),
            }
+
+            # Make sure the new block doesn't clobber a domain with followers
+            blockdata['severity'] = check_followed_severity(host, token, newblock['domain'], max_followed_severity)
            log.info(f"Adding new block for {blockdata['domain']}...")
            if not dryrun:
                add_block(token, host, blockdata)
--- a/etc/sample.fediblockhole.conf.toml
+++ b/etc/sample.fediblockhole.conf.toml
@ -17,11 +17,11 @@ blocklist_url_sources = [

 # List of instances to write blocklist to
 blocklist_instance_destinations = [
-  # { domain = 'eigenmagic.net', token = '<read_write_token>' },
+  # { domain = 'eigenmagic.net', token = '<read_write_token>', max_followed_severity = 'silence'},
 ]

 ## Store a local copy of the remote blocklists after we fetch them
-#keep_intermediate = true
+#save_intermediate = true

 ## Directory to store the local blocklist copies
 # savedir = '/tmp'