Merge pull request #9 from eigenmagic/gentleblock

Block instances 'gently' so people on them have time to escape.
2023-01-09 17:01:46 +11:00 · 2023-01-09 17:01:46 +11:00 · bb84f1e239
parent a134870f14 55184210d4
commit bb84f1e239
3 changed files with 212 additions and 60 deletions
--- a/README.md
+++ b/README.md
@ -13,17 +13,28 @@ A tool for keeping a Mastodon instance blocklist synchronised with remote lists.
 ## Installing
 Instance admins who want to use this tool will need to add an Application at
-`https://<instance-domain>/settings/applications/` they can authorise with an
+`https://<instance-domain>/settings/applications/` so they can authorize the
-OAuth token. For each instance you connect to, add this token to the config file.
+tool to create and update domain blocks with an OAuth token. 
 ### Reading remote instance blocklists
-To read admin blocks from a remote instance, you'll need to ask the instance admin to add a new Application at `https://<instance-domain>/settings/applications/` and then tell you the access token.
+If a remote instance makes its domain blocks public, you don't need
 a token to read them.
-The application needs the `admin:read:domain_blocks` OAuth scope, but unfortunately this
+If a remote instance only shows its domain blocks to local accounts
-scope isn't available in the current application screen (v4.0.2 of Mastodon at
+you'll need to have a token with `read:blocks` authorization set up.
-time of writing). There is a way to do it with scopes, but it's really
+If you have an account on that instance, you can get a token by setting up a new
-dangerous, so I'm not going to tell you what it is here.
+Application at `https://<instance-domain>/settings/applications/`.
 To read admin blocks from a remote instance, you'll need to ask the instance
 admin to add a new Application at
 `https://<instance-domain>/settings/applications/` and then tell you the access
 token.
 The application needs the `admin:read:domain_blocks` OAuth scope, but
 unfortunately this scope isn't available in the current application screen
 (v4.0.2 of Mastodon at time of writing). There is a way to do it with scopes,
 but it's really dangerous, so I'm not going to tell you what it is here.
 A better way is to ask the instance admin to connect to the PostgreSQL database
 and add the scope there, like this:
@ -68,20 +79,74 @@ Or you can use the default location of `/etc/default/fediblockhole.conf.toml`.
 As the filename suggests, FediBlockHole uses TOML syntax.
-There are 2 key sections:
+There are 3 key sections:
 1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from
 1. `blocklist_instance_sources`: A list of instances to read blocklists from via API
 1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API
- 1. `blocklist_instance_sources`: A list of instances to read blocklists from
+### URL sources
 1. `blocklist_instance_destinations`: A list of instances to write blocklists to
-Each is a list of dictionaries of the form:
+The URL sources is a list of URLs to fetch a CSV formatted blocklist from.
 The required fields are `domain` and `severity`.
 Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`.
 ### Instance sources
 The tool can also read domain_blocks from instances directly.
 The configuration is a list of dictionaries of the form:
 ```
-{ domain = '<domain_name>', token = '<BearerToken>' }
+{ domain = '<domain_name>', token = '<BearerToken>', admin = false }
 ```
 The `domain` is the fully-qualified domain name of the API host for an instance
-you want to read or write domain blocks to/from. The `BearerToken` is the OAuth
+you want to read or write domain blocks to/from. 
-token for the application that's configured in the instance to allow you to
+
-read/write domain blocks, as discussed above.
+The `token` is an optional OAuth token for the application that's configured in
 the instance to allow you to read/write domain blocks, as discussed above.
 `admin` is an optional field that tells the tool to use the more detailed admin
 API endpoint for domain_blocks, rather than the more public API endpoint that
 doesn't provide as much detail. You will need a `token` that's been configured to
 permit access to the admin domain_blocks scope, as detailed above.
 ### Instance destinations
 The tool supports pushing a unified blocklist to multiple instances.
 Configure the list of instances you want to push your blocklist to in the
 `blocklist_instance_detinations` list. Each entry is of the form:
 ```
 { domain = '<domain_name>', token = '<BearerToken>', max_followed_severity = 'silence' }
 ```
 The fields `domain` and `token` are required. `max_followed_severity` is optional.
 The `domain` is the hostname of the instance you want to push to. The `token` is
 an application token with both `admin:read:domain_blocks` and
 `admin:write:domain_blocks` authorization.
 The optional `max_followed_severity` setting sets a per-instance limit on the
 severity of a domain_block if there are accounts on the instance that follow
 accounts on the domain to be blocked. If `max_followed_severity` isn't set, it
 defaults to 'silence'.
 This setting exists to give people time to move off an instance that is about to
 be defederated and bring their followers from your instance with them. Without
 it, if a new Suspend block appears in any of the blocklists you subscribe to (or
 a block level increases from Silence to Suspend) and you're using the default
 `max` mergeplan, the tool would immediately suspend the instance, cutting
 everyone on the blocked instance off from their existing followers on your
 instance, even if they move to a new instance. If you actually want that
 outcome, you can set `max_followed_severity = 'suspend'` and use the `max`
 mergeplan.
 Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan).
 ## Using the tool
@ -91,14 +156,14 @@ Once you've configured the tool, run it like this:
 fediblock_sync.py -c <configfile_path>
 ```
-If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass the config file path.
+If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path.
 ## More advanced configuration
 For a list of possible configuration options, check the `--help` and read the
 sample configuration file in `etc/sample.fediblockhole.conf.toml`.
-### keep_intermediate
+### save_intermediate
 This option tells the tool to save the unmerged blocklists it fetches from
 remote instances and URLs into separate files. This is handy for debugging, or
--- a/bin/fediblock_sync.py
+++ b/bin/fediblock_sync.py
@ -108,7 +108,8 @@ def sync_blocklists(conf: dict):
        for dest in conf.blocklist_instance_destinations:
            domain = dest['domain']
            token = dest['token']
-            push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields)
+            max_followed_severity = dest.get('max_followed_severity', 'silence')
            push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields, max_followed_severity)
 def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
    """Merge fetched remote blocklists into a bulk update
@ -125,7 +126,7 @@ def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
            domain = newblock['domain']
            # If the domain has two asterisks in it, it's obfuscated
            # and we can't really use it, so skip it and do the next one
-            if '**' in domain:
+            if '*' in domain:
                log.debug(f"Domain '{domain}' is obfuscated. Skipping it.")
                continue
@ -177,7 +178,7 @@ def apply_mergeplan(oldblock: dict, newblock: dict, mergeplan: str='max') -> dic
            blockdata['severity'] = newblock['severity']
        # If obfuscate is set and is True for the domain in
-        # any blocklist then obfuscate is set to false.
+        # any blocklist then obfuscate is set to True.
        if newblock.get('obfuscate', False):
            blockdata['obfuscate'] = True
@ -253,7 +254,7 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False,
            url = urlstring.strip('<').rstrip('>')
    log.debug(f"Found {len(domain_blocks)} existing domain blocks.")
-    # Remove fields not in import list
+    # Remove fields not in import list.
    for row in domain_blocks:
        origrow = row.copy()
        for key in origrow:
@ -274,18 +275,98 @@ def delete_block(token: str, host: str, id: int):
    )
    if response.status_code != 200:
        if response.status_code == 404:
-            log.warn(f"No such domain block: {id}")
+            log.warning(f"No such domain block: {id}")
            return
        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
 def fetch_instance_follows(token: str, host: str, domain: str) -> int:
    """Fetch the followers of the target domain at the instance
    @param token: the Bearer authentication token for OAuth access
    @param host: the instance API hostname/IP address
    @param domain: the domain to search for followers of
    @returns: int, number of local followers of remote instance accounts
    """
    api_path = "/api/v1/admin/measures"
    url = f"https://{host}{api_path}"
    key = 'instance_follows'
    # This data structure only allows us to request a single domain
    # at a time, which limits the load on the remote instance of each call
    data = {
        'keys': [
            key
            ],
        key: { 'domain': domain },
    }
    # The Mastodon API only accepts JSON formatted POST data for measures
    response = requests.post(url,
        headers={
            'Authorization': f"Bearer {token}",
        },
        json=data,
    )
    if response.status_code != 200:
        if response.status_code == 403:
            log.error(f"Cannot fetch follow information for {domain} from {host}: {response.content}")
        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
    # Get the total returned
    follows = int(response.json()[0]['total'])
    return follows
 def check_followed_severity(host: str, token: str, domain: str,
    severity: str, max_followed_severity: str='silence'):
    """Check an instance to see if it has followers of a to-be-blocked instance"""
    # If the instance has accounts that follow people on the to-be-blocked domain,
    # limit the maximum severity to the configured `max_followed_severity`.
    follows = fetch_instance_follows(token, host, domain)
    if follows > 0:
        log.debug(f"Instance {host} has {follows} followers of accounts at {domain}.")
        if SEVERITY[severity] > SEVERITY[max_followed_severity]:
            log.warning(f"Instance {host} has {follows} followers of accounts at {domain}. Limiting block severity to {max_followed_severity}.")
            return max_followed_severity
        else:
            return severity
 def is_change_needed(oldblock: dict, newblock: dict, import_fields: list):
    """Compare block definitions to see if changes are needed"""
    # Check if anything is actually different and needs updating
    change_needed = []
    for key in import_fields:
        try:
            oldval = oldblock[key]
            newval = newblock[key]
            log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
            if oldval != newval:
                log.debug("Difference detected. Change needed.")
                change_needed.append(key)
                break
        except KeyError:
            log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
            continue
    return change_needed
 def update_known_block(token: str, host: str, blockdict: dict):
    """Update an existing domain block with information in blockdict"""
    api_path = "/api/v1/admin/domain_blocks/"
-    id = blockdict['id']
+    try:
-    blockdata = blockdict.copy()
+        id = blockdict['id']
-    del blockdata['id']
+        blockdata = blockdict.copy()
        del blockdata['id']
    except KeyError:
        import pdb
        pdb.set_trace()
    url = f"https://{host}{api_path}{id}"
@ -308,12 +389,20 @@ def add_block(token: str, host: str, blockdata: dict):
        headers={'Authorization': f"Bearer {token}"},
        data=blockdata
    )
-    if response.status_code != 200:
+    if response.status_code == 422:
-        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
+        # A stricter block already exists. Probably for the base domain.
        err = json.loads(response.content)
        log.warning(err['error'])
    elif response.status_code != 200:
        raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
 def push_blocklist(token: str, host: str, blocklist: list[dict],
                    dryrun: bool=False,
-                    import_fields: list=['domain', 'severity']):
+                    import_fields: list=['domain', 'severity'],
                    max_followed_severity='silence',
                    ):
    """Push a blocklist to a remote instance.
    Merging the blocklist with the existing list the instance has,
@ -326,47 +415,42 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
    """
    log.info(f"Pushing blocklist to host {host} ...")
    # Fetch the existing blocklist from the instance
-    # Force use of the admin API
+    # Force use of the admin API, and add 'id' to the list of fields
    if 'id' not in import_fields:
        import_fields.append('id')
    serverblocks = fetch_instance_blocklist(host, token, True, import_fields)
-    # Convert serverblocks to a dictionary keyed by domain name
+    # # Convert serverblocks to a dictionary keyed by domain name
    knownblocks = {row['domain']: row for row in serverblocks}
    for newblock in blocklist:
-        log.debug(f"applying newblock: {newblock}")
+        log.debug(f"Applying newblock: {newblock}")
        oldblock = knownblocks.get(newblock['domain'], None)
        if oldblock:
            log.debug(f"Block already exists for {newblock['domain']}, checking for differences...")
-            # Check if anything is actually different and needs updating
+            change_needed = is_change_needed(oldblock, newblock, import_fields)
            change_needed = False
            for key in import_fields:
                try:
                    oldval = oldblock[key]
                    newval = newblock[key]
                    log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
                    if oldval != newval:
                        log.debug("Difference detected. Change needed.")
                        change_needed = True
                        break
                except KeyError:
                    log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
                    continue
            if change_needed:
-                log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
+                # Change might be needed, but let's see if the severity
-                blockdata = oldblock.copy()
+                # needs to change. If not, maybe no changes are needed?
-                blockdata.update(newblock)
+                newseverity = check_followed_severity(host, token, oldblock['domain'], newblock['severity'], max_followed_severity)
-                if not dryrun:
+                if newseverity != oldblock['severity']:
-                    update_known_block(token, host, blockdata)
+                    newblock['severity'] = newseverity
-                    # add a pause here so we don't melt the instance
+                    change_needed.append('severity')
-                    time.sleep(1)
+
-                else:
+                # Change still needed?
-                    log.info("Dry run selected. Not applying changes.")
+                if change_needed:
                    log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
                    blockdata = oldblock.copy()
                    blockdata.update(newblock)
                    if not dryrun:
                        update_known_block(token, host, blockdata)
                        # add a pause here so we don't melt the instance
                        time.sleep(1)
                    else:
                        log.info("Dry run selected. Not applying changes.")
            else:
                log.debug("No differences detected. Not updating.")
@ -385,6 +469,9 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
                'reject_reports': newblock.get('reject_reports', False),
                'obfuscate': newblock.get('obfuscate', False),
            }
            # Make sure the new block doesn't clobber a domain with followers
            blockdata['severity'] = check_followed_severity(host, token, newblock['domain'], max_followed_severity)
            log.info(f"Adding new block for {blockdata['domain']}...")
            if not dryrun:
                add_block(token, host, blockdata)
@ -514,4 +601,4 @@ if __name__ == '__main__':
    args = augment_args(args)
    # Do the work of syncing
-    sync_blocklists(args)
+    sync_blocklists(args)
--- a/etc/sample.fediblockhole.conf.toml
+++ b/etc/sample.fediblockhole.conf.toml
@ -17,11 +17,11 @@ blocklist_url_sources = [
 # List of instances to write blocklist to
 blocklist_instance_destinations = [
-  # { domain = 'eigenmagic.net', token = '<read_write_token>' },
+  # { domain = 'eigenmagic.net', token = '<read_write_token>', max_followed_severity = 'silence'},
 ]
 ## Store a local copy of the remote blocklists after we fetch them
-#keep_intermediate = true
+#save_intermediate = true
 ## Directory to store the local blocklist copies
 # savedir = '/tmp'