Add ability to set max severity level if

an instance has followers of accounts on
a to-be-blocked domain.
Refactored the change detection code.
Fixed a bug in config of intermediate blocklists saving.
Updated README documentation.
Updated sample config.

Addresses #5
This commit is contained in:
Justin Warren 2023-01-09 16:51:30 +11:00
parent a134870f14
commit 55184210d4
No known key found for this signature in database
3 changed files with 212 additions and 60 deletions

View File

@ -13,17 +13,28 @@ A tool for keeping a Mastodon instance blocklist synchronised with remote lists.
## Installing
Instance admins who want to use this tool will need to add an Application at
`https://<instance-domain>/settings/applications/` they can authorise with an
OAuth token. For each instance you connect to, add this token to the config file.
`https://<instance-domain>/settings/applications/` so they can authorize the
tool to create and update domain blocks with an OAuth token.
### Reading remote instance blocklists
To read admin blocks from a remote instance, you'll need to ask the instance admin to add a new Application at `https://<instance-domain>/settings/applications/` and then tell you the access token.
If a remote instance makes its domain blocks public, you don't need
a token to read them.
The application needs the `admin:read:domain_blocks` OAuth scope, but unfortunately this
scope isn't available in the current application screen (v4.0.2 of Mastodon at
time of writing). There is a way to do it with scopes, but it's really
dangerous, so I'm not going to tell you what it is here.
If a remote instance only shows its domain blocks to local accounts
you'll need to have a token with `read:blocks` authorization set up.
If you have an account on that instance, you can get a token by setting up a new
Application at `https://<instance-domain>/settings/applications/`.
To read admin blocks from a remote instance, you'll need to ask the instance
admin to add a new Application at
`https://<instance-domain>/settings/applications/` and then tell you the access
token.
The application needs the `admin:read:domain_blocks` OAuth scope, but
unfortunately this scope isn't available in the current application screen
(v4.0.2 of Mastodon at time of writing). There is a way to do it with scopes,
but it's really dangerous, so I'm not going to tell you what it is here.
A better way is to ask the instance admin to connect to the PostgreSQL database
and add the scope there, like this:
@ -68,20 +79,74 @@ Or you can use the default location of `/etc/default/fediblockhole.conf.toml`.
As the filename suggests, FediBlockHole uses TOML syntax.
There are 2 key sections:
There are 3 key sections:
1. `blocklist_urls_sources`: A list of URLS to read CSV formatted blocklists from
1. `blocklist_instance_sources`: A list of instances to read blocklists from via API
1. `blocklist_instance_destinations`: A list of instances to write blocklists to via API
1. `blocklist_instance_sources`: A list of instances to read blocklists from
1. `blocklist_instance_destinations`: A list of instances to write blocklists to
### URL sources
Each is a list of dictionaries of the form:
The URL sources is a list of URLs to fetch a CSV formatted blocklist from.
The required fields are `domain` and `severity`.
Optional fields that the tool understands are `public_comment`, `private_comment`, `obfuscate`, `reject_media` and `reject_reports`.
### Instance sources
The tool can also read domain_blocks from instances directly.
The configuration is a list of dictionaries of the form:
```
{ domain = '<domain_name>', token = '<BearerToken>' }
{ domain = '<domain_name>', token = '<BearerToken>', admin = false }
```
The `domain` is the fully-qualified domain name of the API host for an instance
you want to read or write domain blocks to/from. The `BearerToken` is the OAuth
token for the application that's configured in the instance to allow you to
read/write domain blocks, as discussed above.
you want to read or write domain blocks to/from.
The `token` is an optional OAuth token for the application that's configured in
the instance to allow you to read/write domain blocks, as discussed above.
`admin` is an optional field that tells the tool to use the more detailed admin
API endpoint for domain_blocks, rather than the more public API endpoint that
doesn't provide as much detail. You will need a `token` that's been configured to
permit access to the admin domain_blocks scope, as detailed above.
### Instance destinations
The tool supports pushing a unified blocklist to multiple instances.
Configure the list of instances you want to push your blocklist to in the
`blocklist_instance_detinations` list. Each entry is of the form:
```
{ domain = '<domain_name>', token = '<BearerToken>', max_followed_severity = 'silence' }
```
The fields `domain` and `token` are required. `max_followed_severity` is optional.
The `domain` is the hostname of the instance you want to push to. The `token` is
an application token with both `admin:read:domain_blocks` and
`admin:write:domain_blocks` authorization.
The optional `max_followed_severity` setting sets a per-instance limit on the
severity of a domain_block if there are accounts on the instance that follow
accounts on the domain to be blocked. If `max_followed_severity` isn't set, it
defaults to 'silence'.
This setting exists to give people time to move off an instance that is about to
be defederated and bring their followers from your instance with them. Without
it, if a new Suspend block appears in any of the blocklists you subscribe to (or
a block level increases from Silence to Suspend) and you're using the default
`max` mergeplan, the tool would immediately suspend the instance, cutting
everyone on the blocked instance off from their existing followers on your
instance, even if they move to a new instance. If you actually want that
outcome, you can set `max_followed_severity = 'suspend'` and use the `max`
mergeplan.
Once the follow count drops to 0, the tool will automatically use the highest severity it finds again (if you're using the `max` mergeplan).
## Using the tool
@ -91,14 +156,14 @@ Once you've configured the tool, run it like this:
fediblock_sync.py -c <configfile_path>
```
If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass the config file path.
If you put the config file in `/etc/default/fediblockhole.conf.toml` you don't need to pass in the config file path.
## More advanced configuration
For a list of possible configuration options, check the `--help` and read the
sample configuration file in `etc/sample.fediblockhole.conf.toml`.
### keep_intermediate
### save_intermediate
This option tells the tool to save the unmerged blocklists it fetches from
remote instances and URLs into separate files. This is handy for debugging, or

View File

@ -108,7 +108,8 @@ def sync_blocklists(conf: dict):
for dest in conf.blocklist_instance_destinations:
domain = dest['domain']
token = dest['token']
push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields)
max_followed_severity = dest.get('max_followed_severity', 'silence')
push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields, max_followed_severity)
def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
"""Merge fetched remote blocklists into a bulk update
@ -125,7 +126,7 @@ def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
domain = newblock['domain']
# If the domain has two asterisks in it, it's obfuscated
# and we can't really use it, so skip it and do the next one
if '**' in domain:
if '*' in domain:
log.debug(f"Domain '{domain}' is obfuscated. Skipping it.")
continue
@ -177,7 +178,7 @@ def apply_mergeplan(oldblock: dict, newblock: dict, mergeplan: str='max') -> dic
blockdata['severity'] = newblock['severity']
# If obfuscate is set and is True for the domain in
# any blocklist then obfuscate is set to false.
# any blocklist then obfuscate is set to True.
if newblock.get('obfuscate', False):
blockdata['obfuscate'] = True
@ -253,7 +254,7 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False,
url = urlstring.strip('<').rstrip('>')
log.debug(f"Found {len(domain_blocks)} existing domain blocks.")
# Remove fields not in import list
# Remove fields not in import list.
for row in domain_blocks:
origrow = row.copy()
for key in origrow:
@ -274,18 +275,98 @@ def delete_block(token: str, host: str, id: int):
)
if response.status_code != 200:
if response.status_code == 404:
log.warn(f"No such domain block: {id}")
log.warning(f"No such domain block: {id}")
return
raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
def fetch_instance_follows(token: str, host: str, domain: str) -> int:
"""Fetch the followers of the target domain at the instance
@param token: the Bearer authentication token for OAuth access
@param host: the instance API hostname/IP address
@param domain: the domain to search for followers of
@returns: int, number of local followers of remote instance accounts
"""
api_path = "/api/v1/admin/measures"
url = f"https://{host}{api_path}"
key = 'instance_follows'
# This data structure only allows us to request a single domain
# at a time, which limits the load on the remote instance of each call
data = {
'keys': [
key
],
key: { 'domain': domain },
}
# The Mastodon API only accepts JSON formatted POST data for measures
response = requests.post(url,
headers={
'Authorization': f"Bearer {token}",
},
json=data,
)
if response.status_code != 200:
if response.status_code == 403:
log.error(f"Cannot fetch follow information for {domain} from {host}: {response.content}")
raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
# Get the total returned
follows = int(response.json()[0]['total'])
return follows
def check_followed_severity(host: str, token: str, domain: str,
severity: str, max_followed_severity: str='silence'):
"""Check an instance to see if it has followers of a to-be-blocked instance"""
# If the instance has accounts that follow people on the to-be-blocked domain,
# limit the maximum severity to the configured `max_followed_severity`.
follows = fetch_instance_follows(token, host, domain)
if follows > 0:
log.debug(f"Instance {host} has {follows} followers of accounts at {domain}.")
if SEVERITY[severity] > SEVERITY[max_followed_severity]:
log.warning(f"Instance {host} has {follows} followers of accounts at {domain}. Limiting block severity to {max_followed_severity}.")
return max_followed_severity
else:
return severity
def is_change_needed(oldblock: dict, newblock: dict, import_fields: list):
"""Compare block definitions to see if changes are needed"""
# Check if anything is actually different and needs updating
change_needed = []
for key in import_fields:
try:
oldval = oldblock[key]
newval = newblock[key]
log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
if oldval != newval:
log.debug("Difference detected. Change needed.")
change_needed.append(key)
break
except KeyError:
log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
continue
return change_needed
def update_known_block(token: str, host: str, blockdict: dict):
"""Update an existing domain block with information in blockdict"""
api_path = "/api/v1/admin/domain_blocks/"
id = blockdict['id']
blockdata = blockdict.copy()
del blockdata['id']
try:
id = blockdict['id']
blockdata = blockdict.copy()
del blockdata['id']
except KeyError:
import pdb
pdb.set_trace()
url = f"https://{host}{api_path}{id}"
@ -308,12 +389,20 @@ def add_block(token: str, host: str, blockdata: dict):
headers={'Authorization': f"Bearer {token}"},
data=blockdata
)
if response.status_code != 200:
raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
if response.status_code == 422:
# A stricter block already exists. Probably for the base domain.
err = json.loads(response.content)
log.warning(err['error'])
elif response.status_code != 200:
raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
def push_blocklist(token: str, host: str, blocklist: list[dict],
dryrun: bool=False,
import_fields: list=['domain', 'severity']):
import_fields: list=['domain', 'severity'],
max_followed_severity='silence',
):
"""Push a blocklist to a remote instance.
Merging the blocklist with the existing list the instance has,
@ -326,47 +415,42 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
"""
log.info(f"Pushing blocklist to host {host} ...")
# Fetch the existing blocklist from the instance
# Force use of the admin API
# Force use of the admin API, and add 'id' to the list of fields
if 'id' not in import_fields:
import_fields.append('id')
serverblocks = fetch_instance_blocklist(host, token, True, import_fields)
# Convert serverblocks to a dictionary keyed by domain name
# # Convert serverblocks to a dictionary keyed by domain name
knownblocks = {row['domain']: row for row in serverblocks}
for newblock in blocklist:
log.debug(f"applying newblock: {newblock}")
log.debug(f"Applying newblock: {newblock}")
oldblock = knownblocks.get(newblock['domain'], None)
if oldblock:
log.debug(f"Block already exists for {newblock['domain']}, checking for differences...")
# Check if anything is actually different and needs updating
change_needed = False
for key in import_fields:
try:
oldval = oldblock[key]
newval = newblock[key]
log.debug(f"Compare {key} '{oldval}' <> '{newval}'")
if oldval != newval:
log.debug("Difference detected. Change needed.")
change_needed = True
break
except KeyError:
log.debug(f"Key '{key}' missing from block definition so cannot compare. Continuing...")
continue
change_needed = is_change_needed(oldblock, newblock, import_fields)
if change_needed:
log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
blockdata = oldblock.copy()
blockdata.update(newblock)
if not dryrun:
update_known_block(token, host, blockdata)
# add a pause here so we don't melt the instance
time.sleep(1)
else:
log.info("Dry run selected. Not applying changes.")
# Change might be needed, but let's see if the severity
# needs to change. If not, maybe no changes are needed?
newseverity = check_followed_severity(host, token, oldblock['domain'], newblock['severity'], max_followed_severity)
if newseverity != oldblock['severity']:
newblock['severity'] = newseverity
change_needed.append('severity')
# Change still needed?
if change_needed:
log.info(f"Change detected. Updating domain block for {oldblock['domain']}")
blockdata = oldblock.copy()
blockdata.update(newblock)
if not dryrun:
update_known_block(token, host, blockdata)
# add a pause here so we don't melt the instance
time.sleep(1)
else:
log.info("Dry run selected. Not applying changes.")
else:
log.debug("No differences detected. Not updating.")
@ -385,6 +469,9 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
'reject_reports': newblock.get('reject_reports', False),
'obfuscate': newblock.get('obfuscate', False),
}
# Make sure the new block doesn't clobber a domain with followers
blockdata['severity'] = check_followed_severity(host, token, newblock['domain'], max_followed_severity)
log.info(f"Adding new block for {blockdata['domain']}...")
if not dryrun:
add_block(token, host, blockdata)
@ -514,4 +601,4 @@ if __name__ == '__main__':
args = augment_args(args)
# Do the work of syncing
sync_blocklists(args)
sync_blocklists(args)

View File

@ -17,11 +17,11 @@ blocklist_url_sources = [
# List of instances to write blocklist to
blocklist_instance_destinations = [
# { domain = 'eigenmagic.net', token = '<read_write_token>' },
# { domain = 'eigenmagic.net', token = '<read_write_token>', max_followed_severity = 'silence'},
]
## Store a local copy of the remote blocklists after we fetch them
#keep_intermediate = true
#save_intermediate = true
## Directory to store the local blocklist copies
# savedir = '/tmp'