commit
525dc876e0
31
CHANGELOG.md
31
CHANGELOG.md
|
@ -6,7 +6,36 @@ This project uses [Semantic Versioning] and generally follows the conventions of
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
- Planning to add allowlist thresholds as noted in #28
|
||||
## [v0.4.4] - 2023-07-09
|
||||
|
||||
### Added
|
||||
|
||||
- Added citation for creators of #Fediblock (a64875b)
|
||||
- Added parser for Mastodon 4.1 blocklist CSV format (9f95f14)
|
||||
- Added container support (76d5b61)
|
||||
|
||||
### Fixed
|
||||
|
||||
- Use __future__.annotations so type hints work with Python < 2.9 (8265639)
|
||||
- test util no longer tries to load default config file if conf tomldata is empty. (2da57b2)
|
||||
|
||||
## [v0.4.3] - 2023-02-13
|
||||
|
||||
### Added
|
||||
|
||||
- Added Mastodon public API parser type because #33 (9fe9342)
|
||||
- Added ability to set scheme when talking to instances (9fe9342)
|
||||
- Added tests of comment merging. (fb3a7ec)
|
||||
- Added blocklist thresholds. (bb1d89e)
|
||||
- Added logging to help debug threshold-based merging. (b67ff0c)
|
||||
- Added extra documentation on configuring thresholds. (6c72af8)
|
||||
- Updated documentation to reflect Mastodon v4.1.0 changes to the application scopes screen. (b92dd21)
|
||||
|
||||
### Changed
|
||||
|
||||
- Dropped minimum Python version to 3.6 (df3c16f)
|
||||
- Don't merge comments if new comment is empty. (b8aa11e)
|
||||
- Tweaked comment merging to pass tests. (fb3a7ec)
|
||||
|
||||
## [v0.4.2] - 2023-01-19
|
||||
|
||||
|
|
49
README.md
49
README.md
|
@ -6,15 +6,19 @@ The broad design goal for FediBlockHole is to support pulling in a list of
|
|||
blocklists from a set of trusted sources, merge them into a combined blocklist,
|
||||
and then push that merged list to a set of managed instances.
|
||||
|
||||
Inspired by the way PiHole works for maintaining a set of blocklists of adtech
|
||||
domains.
|
||||
|
||||
Mastodon admins can choose who they think maintain quality lists and subscribe
|
||||
to them, helping to distribute the load for maintaining blocklists among a
|
||||
community of people. Control ultimately rests with the admins themselves so they
|
||||
can outsource as much, or as little, of the effort to others as they deem
|
||||
appropriate.
|
||||
|
||||
Inspired by the way PiHole works for maintaining a set of blocklists of adtech
|
||||
domains. Builds on the work of
|
||||
[@CaribenxMarciaX@scholar.social](https://scholar.social/@CaribenxMarciaX) and
|
||||
[@gingerrroot@kitty.town](https://kitty.town/@gingerrroot) who started the
|
||||
#Fediblock hashtag and did a lot of advocacy around it, often at great personal
|
||||
cost.
|
||||
|
||||
## Features
|
||||
|
||||
### Blocklist Sources
|
||||
|
@ -41,6 +45,8 @@ appropriate.
|
|||
|
||||
- Provides (hopefully) sensible defaults to minimise first-time setup.
|
||||
- Global and fine-grained configuration options available for those complex situations that crop up sometimes.
|
||||
- Allowlists to override blocks in blocklists to ensure you never block instances you want to keep.
|
||||
- Blocklist thresholds if you want to only block when an instance shows up in multiple blocklists.
|
||||
|
||||
## Installing
|
||||
|
||||
|
@ -79,17 +85,16 @@ admin to add a new Application at
|
|||
`https://<instance-domain>/settings/applications/` and then tell you the access
|
||||
token.
|
||||
|
||||
The application needs the `admin:read:domain_blocks` OAuth scope, but
|
||||
unfortunately this scope isn't available in the current application screen
|
||||
(v4.0.2 of Mastodon at time of writing, but this has been fixed in the main
|
||||
branch).
|
||||
The application needs the `admin:read:domain_blocks` OAuth scope. You can allow
|
||||
full `admin:read` access, but be aware that this authorizes someone to read all
|
||||
the data in the instance. That's asking a lot of a remote instance admin who
|
||||
just wants to share domain_blocks with you.
|
||||
|
||||
You can allow full `admin:read` access, but be aware that this authorizes
|
||||
someone to read all the data in the instance. That's asking a lot of a remote
|
||||
instance admin who just wants to share domain_blocks with you.
|
||||
The `admin:read:domain_blocks` scope is available as of Mastodon v4.1.0, but for
|
||||
earlier versions admins will need to use the manual method described below.
|
||||
|
||||
For now, you can ask the instance admin to update the scope in the database
|
||||
directly like this:
|
||||
You can update the scope for your application in the database directly like
|
||||
this:
|
||||
|
||||
```
|
||||
UPDATE oauth_applications as app
|
||||
|
@ -134,8 +139,12 @@ chmod o-r <configfile>
|
|||
```
|
||||
|
||||
You can also grant full `admin:write` scope to the application, but if you'd
|
||||
prefer to keep things more tightly secured you'll need to use SQL to set the
|
||||
scopes in the database and then regenerate the token:
|
||||
prefer to keep things more tightly secured, limit the scope to
|
||||
`admin:read:domain_blocks`.
|
||||
|
||||
Again, this scope is only available in the application config screen as of
|
||||
Mastodon v4.1.0. If your instance is on an earlier version, you'll need to use
|
||||
SQL to set the scopes in the database and then regenerate the token:
|
||||
|
||||
```
|
||||
UPDATE oauth_applications as app
|
||||
|
@ -192,6 +201,7 @@ Supported formats are currently:
|
|||
|
||||
- Comma-Separated Values (CSV)
|
||||
- JSON
|
||||
- Mastodon v4.1 flavoured CSV
|
||||
- RapidBlock CSV
|
||||
- RapidBlock JSON
|
||||
|
||||
|
@ -209,6 +219,17 @@ A CSV format blocklist must contain a header row with at least a `domain` and `s
|
|||
|
||||
Optional fields, as listed about, may also be included.
|
||||
|
||||
#### Mastodon v4.1 CSV format
|
||||
|
||||
As of v4.1.0, Mastodon can export domain blocks as a CSV file. However, in their
|
||||
infinite wisdom, the Mastodon devs decided that field names should begin with a
|
||||
`#` character in the header, unlike the field names in the JSON output via the
|
||||
API… or in pretty much any other CSV file anywhere else.
|
||||
|
||||
Setting the format to `mastodon_csv` will strip off the `#` character when
|
||||
parsing and FediBlockHole can then use Mastodon v4.1 CSV blocklists like any
|
||||
other CSV formatted blocklist.
|
||||
|
||||
#### JSON format
|
||||
|
||||
JSON is also supported. It uses the same format as the JSON returned from the Mastodon API.
|
||||
|
|
|
@ -0,0 +1,34 @@
|
|||
# A helm chart's templates and default values can be packaged into a .tgz file.
|
||||
# When doing that, not everything should be bundled into the .tgz file. This
|
||||
# file describes what to not bundle.
|
||||
#
|
||||
# Manually added by us
|
||||
# --------------------
|
||||
#
|
||||
|
||||
# Boilerplate .helmignore from `helm create mastodon`
|
||||
# ---------------------------------------------------
|
||||
#
|
||||
# Patterns to ignore when building packages.
|
||||
# This supports shell glob matching, relative path matching, and
|
||||
# negation (prefixed with !). Only one pattern per line.
|
||||
.DS_Store
|
||||
# Common VCS dirs
|
||||
.git/
|
||||
.gitignore
|
||||
.bzr/
|
||||
.bzrignore
|
||||
.hg/
|
||||
.hgignore
|
||||
.svn/
|
||||
# Common backup files
|
||||
*.swp
|
||||
*.bak
|
||||
*.tmp
|
||||
*.orig
|
||||
*~
|
||||
# Various IDEs
|
||||
.project
|
||||
.idea/
|
||||
*.tmproj
|
||||
.vscode/
|
|
@ -0,0 +1,23 @@
|
|||
apiVersion: v2
|
||||
name: fediblockhole
|
||||
description: FediBlockHole is a tool for keeping a Mastodon instance blocklist synchronised with remote lists.
|
||||
|
||||
# A chart can be either an 'application' or a 'library' chart.
|
||||
#
|
||||
# Application charts are a collection of templates that can be packaged into versioned archives
|
||||
# to be deployed.
|
||||
#
|
||||
# Library charts provide useful utilities or functions for the chart developer. They're included as
|
||||
# a dependency of application charts to inject those utilities and functions into the rendering
|
||||
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
|
||||
type: application
|
||||
|
||||
# This is the chart version. This version number should be incremented each time you make changes
|
||||
# to the chart and its templates, including the app version.
|
||||
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
||||
version: 1.1.0
|
||||
|
||||
# This is the version number of the application being deployed. This version number should be
|
||||
# incremented each time you make changes to the application. Versions are not expected to
|
||||
# follow Semantic Versioning. They should reflect the version the application is using.
|
||||
appVersion: 0.4.2
|
|
@ -0,0 +1,67 @@
|
|||
# List of instances to read blocklists from.
|
||||
# If the instance makes its blocklist public, no authorization token is needed.
|
||||
# Otherwise, `token` is a Bearer token authorised to read domain_blocks.
|
||||
# If `admin` = True, use the more detailed admin API, which requires a token with a
|
||||
# higher level of authorization.
|
||||
# If `import_fields` are provided, only import these fields from the instance.
|
||||
# Overrides the global `import_fields` setting.
|
||||
blocklist_instance_sources = [
|
||||
# { domain = 'public.blocklist'}, # an instance with a public list of domain_blocks
|
||||
# { domain = 'jorts.horse', token = '<a_different_token>' }, # user accessible block list
|
||||
# { domain = 'eigenmagic.net', token = '<a_token_with_read_auth>', admin = true }, # admin access required
|
||||
]
|
||||
|
||||
# List of URLs to read csv blocklists from
|
||||
# Format tells the parser which format to use when parsing the blocklist
|
||||
# max_severity tells the parser to override any severities that are higher than this value
|
||||
# import_fields tells the parser to only import that set of fields from a specific source
|
||||
blocklist_url_sources = [
|
||||
# { url = 'file:///path/to/fediblockhole/samples/demo-blocklist-01.csv', format = 'csv' },
|
||||
{ url = 'https://raw.githubusercontent.com/eigenmagic/fediblockhole/main/samples/demo-blocklist-01.csv', format = 'csv' },
|
||||
|
||||
]
|
||||
|
||||
## These global allowlists override blocks from blocklists
|
||||
# These are the same format and structure as blocklists, but they take precedence
|
||||
allowlist_url_sources = [
|
||||
{ url = 'https://raw.githubusercontent.com/eigenmagic/fediblockhole/main/samples/demo-allowlist-01.csv', format = 'csv' },
|
||||
{ url = 'https://raw.githubusercontent.com/eigenmagic/fediblockhole/main/samples/demo-allowlist-02.csv', format = 'csv' },
|
||||
]
|
||||
|
||||
# List of instances to write blocklist to
|
||||
blocklist_instance_destinations = [
|
||||
# { domain = 'eigenmagic.net', token = '<read_write_token>', max_followed_severity = 'silence'},
|
||||
]
|
||||
|
||||
## Store a local copy of the remote blocklists after we fetch them
|
||||
#save_intermediate = true
|
||||
|
||||
## Directory to store the local blocklist copies
|
||||
# savedir = '/tmp'
|
||||
|
||||
## File to save the fully merged blocklist into
|
||||
# blocklist_savefile = '/tmp/merged_blocklist.csv'
|
||||
|
||||
## Don't push blocklist to instances, even if they're defined above
|
||||
# no_push_instance = false
|
||||
|
||||
## Don't fetch blocklists from URLs, even if they're defined above
|
||||
# no_fetch_url = false
|
||||
|
||||
## Don't fetch blocklists from instances, even if they're defined above
|
||||
# no_fetch_instance = false
|
||||
|
||||
## Set the mergeplan to use when dealing with overlaps between blocklists
|
||||
# The default 'max' mergeplan will use the harshest severity block found for a domain.
|
||||
# The 'min' mergeplan will use the lightest severity block found for a domain.
|
||||
# mergeplan = 'max'
|
||||
|
||||
## Set which fields we import
|
||||
## 'domain' and 'severity' are always imported, these are additional
|
||||
##
|
||||
import_fields = ['public_comment', 'reject_media', 'reject_reports', 'obfuscate']
|
||||
|
||||
## Set which fields we export
|
||||
## 'domain' and 'severity' are always exported, these are additional
|
||||
##
|
||||
export_fields = ['public_comment']
|
|
@ -0,0 +1,70 @@
|
|||
{{/* vim: set filetype=mustache: */}}
|
||||
{{/*
|
||||
Expand the name of the chart.
|
||||
*/}}
|
||||
{{- define "fediblockhole.name" -}}
|
||||
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create a default fully qualified app name.
|
||||
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
|
||||
If release name contains chart name it will be used as a full name.
|
||||
*/}}
|
||||
{{- define "fediblockhole.fullname" -}}
|
||||
{{- if .Values.fullnameOverride }}
|
||||
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- $name := default .Chart.Name .Values.nameOverride }}
|
||||
{{- if contains $name .Release.Name }}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create chart name and version as used by the chart label.
|
||||
*/}}
|
||||
{{- define "fediblockhole.chart" -}}
|
||||
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Common labels
|
||||
*/}}
|
||||
{{- define "fediblockhole.labels" -}}
|
||||
helm.sh/chart: {{ include "fediblockhole.chart" . }}
|
||||
{{ include "fediblockhole.selectorLabels" . }}
|
||||
{{- if .Chart.AppVersion }}
|
||||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
|
||||
{{- end }}
|
||||
app.kubernetes.io/managed-by: {{ .Release.Service }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Selector labels
|
||||
*/}}
|
||||
{{- define "fediblockhole.selectorLabels" -}}
|
||||
app.kubernetes.io/name: {{ include "fediblockhole.name" . }}
|
||||
app.kubernetes.io/instance: {{ .Release.Name }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Rolling pod annotations
|
||||
*/}}
|
||||
{{- define "fediblockhole.rollingPodAnnotations" -}}
|
||||
rollme: {{ .Release.Revision | quote }}
|
||||
checksum/config-configmap: {{ include ( print $.Template.BasePath "/configmap-conf-toml.yaml" ) . | sha256sum | quote }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create the default conf file path and filename
|
||||
*/}}
|
||||
{{- define "fediblockhole.conf_file_path" -}}
|
||||
{{- default "/etc/default/" .Values.fediblockhole.conf_file.path }}
|
||||
{{- end }}
|
||||
{{- define "fediblockhole.conf_file_filename" -}}
|
||||
{{- default "fediblockhole.conf.toml" .Values.fediblockhole.conf_file.filename }}
|
||||
{{- end }}
|
|
@ -0,0 +1,8 @@
|
|||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: {{ include "fediblockhole.fullname" . }}-conf-toml
|
||||
labels:
|
||||
{{- include "fediblockhole.labels" . | nindent 4 }}
|
||||
data:
|
||||
{{ (.Files.Glob "fediblockhole.conf.toml").AsConfig | nindent 4 }}
|
|
@ -0,0 +1,68 @@
|
|||
{{ if .Values.fediblockhole.cron.sync.enabled -}}
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: {{ include "fediblockhole.fullname" . }}-sync
|
||||
labels:
|
||||
{{- include "fediblockhole.labels" . | nindent 4 }}
|
||||
spec:
|
||||
schedule: {{ .Values.fediblockhole.cron.sync.schedule }}
|
||||
failedJobsHistoryLimit: {{ .Values.fediblockhole.cron.sync.failedJobsHistoryLimit }}
|
||||
successfulJobsHistoryLimit: {{ .Values.fediblockhole.cron.sync.successfulJobsHistoryLimit }}
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
metadata:
|
||||
name: {{ include "fediblockhole.fullname" . }}-sync
|
||||
{{- with .Values.jobAnnotations }}
|
||||
annotations:
|
||||
{{- toYaml . | nindent 12 }}
|
||||
{{- end }}
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: {{ include "fediblockhole.fullname" . }}-sync
|
||||
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
|
||||
imagePullPolicy: {{ .Values.image.pullPolicy }}
|
||||
command:
|
||||
- fediblock-sync
|
||||
- -c
|
||||
- "{{- include "fediblockhole.conf_file_path" . -}}{{- include "fediblockhole.conf_file_filename" . -}}"
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: "{{- include "fediblockhole.conf_file_path" . -}}{{- include "fediblockhole.conf_file_filename" . -}}"
|
||||
subPath: "{{- include "fediblockhole.conf_file_filename" . -}}"
|
||||
{{ if .Values.fediblockhole.allow_file.filename }}
|
||||
- name: allowfile
|
||||
mountPath: "{{- include "fediblockhole.conf_file_path" . -}}{{- .Values.fediblockhole.allow_file.filename -}}"
|
||||
subPath: "{{- .Values.fediblockhole.allow_file.filename -}}"
|
||||
{{ end }}
|
||||
{{ if .Values.fediblockhole.block_file.filename }}
|
||||
- name: blockfile
|
||||
mountPath: "{{- include "fediblockhole.conf_file_path" . -}}{{- .Values.fediblockhole.block_file.filename -}}"
|
||||
subPath: "{{- .Values.fediblockhole.block_file.filename -}}"
|
||||
{{ end }}
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: {{ include "fediblockhole.fullname" . }}-conf-toml
|
||||
items:
|
||||
- key: {{ include "fediblockhole.conf_file_filename" . | quote }}
|
||||
path: {{ include "fediblockhole.conf_file_filename" . | quote }}
|
||||
{{ if .Values.fediblockhole.allow_file.filename }}
|
||||
- name: allowfile
|
||||
configMap:
|
||||
name: {{ include "fediblockhole.fullname" . }}-allow-csv
|
||||
items:
|
||||
- key: {{ .Values.fediblockhole.allow_file.filename | quote }}
|
||||
path: {{ .Values.fediblockhole.allow_file.filename | quote }}
|
||||
{{ end }}
|
||||
{{ if .Values.fediblockhole.block_file.filename }}
|
||||
- name: blockfile
|
||||
configMap:
|
||||
name: {{ include "fediblockhole.fullname" . }}-block-csv
|
||||
items:
|
||||
- key: {{ .Values.fediblockhole.block_file.filename | quote }}
|
||||
path: {{ .Values.fediblockhole.block_file.filename | quote }}
|
||||
{{ end }}
|
||||
{{- end }}
|
|
@ -0,0 +1,77 @@
|
|||
image:
|
||||
repository: ghcr.io/cunningpike/fediblockhole
|
||||
# https://github.com/cunningpike/fediblockhole/pkgs/container/fediblockhole/versions
|
||||
#
|
||||
# alternatively, use `latest` for the latest release or `edge` for the image
|
||||
# built from the most recent commit
|
||||
#
|
||||
# tag: latest
|
||||
tag: ""
|
||||
# use `Always` when using `latest` tag
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
fediblockhole:
|
||||
# location of the configuration file. Default is /etc/default/fediblockhole.conf.toml
|
||||
conf_file:
|
||||
path: ""
|
||||
filename: ""
|
||||
# Location of a local allowlist file. It is recommended that this file should at a
|
||||
# minimum contain the web_domain of your own instance.
|
||||
allow_file:
|
||||
# Optionally, set the name of the file. This should match the data key in the
|
||||
# associated ConfigMap
|
||||
filename: ""
|
||||
# Location of a local blocklist file.
|
||||
block_file:
|
||||
# Optionally, set the name of the file. This should match the data key in the
|
||||
# associated ConfigMap
|
||||
filename: ""
|
||||
cron:
|
||||
# -- run `fediblock-sync` every hour
|
||||
sync:
|
||||
# @ignored
|
||||
enabled: false
|
||||
# @ignored
|
||||
schedule: "0 * * * *"
|
||||
failedJobsHistoryLimit: 1
|
||||
successfulJobsHistoryLimit: 3
|
||||
|
||||
# if you manually change the UID/GID environment variables, ensure these values
|
||||
# match:
|
||||
podSecurityContext:
|
||||
runAsUser: 991
|
||||
runAsGroup: 991
|
||||
fsGroup: 991
|
||||
|
||||
# @ignored
|
||||
securityContext: {}
|
||||
|
||||
# -- Kubernetes manages pods for jobs and pods for deployments differently, so you might
|
||||
# need to apply different annotations to the two different sets of pods. The annotations
|
||||
# set with podAnnotations will be added to all deployment-managed pods.
|
||||
podAnnotations: {}
|
||||
|
||||
# -- The annotations set with jobAnnotations will be added to all job pods.
|
||||
jobAnnotations: {}
|
||||
|
||||
# -- Default resources for all Deployments and jobs unless overwritten
|
||||
resources: {}
|
||||
# We usually recommend not to specify default resources and to leave this as a conscious
|
||||
# choice for the user. This also increases chances charts run on environments with little
|
||||
# resources, such as Minikube. If you do want to specify resources, uncomment the following
|
||||
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
|
||||
# limits:
|
||||
# cpu: 100m
|
||||
# memory: 128Mi
|
||||
# requests:
|
||||
# cpu: 100m
|
||||
# memory: 128Mi
|
||||
|
||||
# @ignored
|
||||
nodeSelector: {}
|
||||
|
||||
# @ignored
|
||||
tolerations: []
|
||||
|
||||
# -- Affinity for all pods unless overwritten
|
||||
affinity: {}
|
|
@ -0,0 +1,6 @@
|
|||
Dockerfile
|
||||
#README.md
|
||||
*.pyc
|
||||
*.pyo
|
||||
*.pyd
|
||||
__pycache__
|
|
@ -0,0 +1,14 @@
|
|||
# Use the official lightweight Python image.
|
||||
# https://hub.docker.com/_/python
|
||||
FROM python:slim
|
||||
|
||||
# Copy local code to the container image.
|
||||
ENV APP_HOME /app
|
||||
WORKDIR $APP_HOME
|
||||
|
||||
# Install production dependencies.
|
||||
RUN pip install fediblockhole
|
||||
|
||||
USER 1001
|
||||
# Set the command on start to fediblock-sync.
|
||||
ENTRYPOINT ["fediblock-sync"]
|
|
@ -56,6 +56,24 @@ blocklist_instance_destinations = [
|
|||
# The 'min' mergeplan will use the lightest severity block found for a domain.
|
||||
# mergeplan = 'max'
|
||||
|
||||
## Optional threshold-based merging.
|
||||
# Only merge in domain blocks if the domain is mentioned in
|
||||
# at least `threshold` blocklists.
|
||||
# `merge_thresold` is an integer, with a default value of 0.
|
||||
# The `merge_threshold_type` can be `count` or `pct`.
|
||||
# If `count` type is selected, the threshold is reached when the domain
|
||||
# is mentioned in at least `merge_threshold` blocklists. The default value
|
||||
# of 0 means that every block in every list will be merged in.
|
||||
# If `pct` type is selected, `merge_threshold` is interpreted as a percentage,
|
||||
# i.e. if `merge_threshold` = 20, blocks will only be merged in if the domain
|
||||
# is present in at least 20% of blocklists.
|
||||
# Percentage calculated as number_of_mentions / total_number_of_blocklists.
|
||||
# The percentage method is more flexibile, but also more complicated, so take care
|
||||
# when using it.
|
||||
#
|
||||
# merge_threshold_type = 'count'
|
||||
# merge_threshold = 0
|
||||
|
||||
## Set which fields we import
|
||||
## 'domain' and 'severity' are always imported, these are additional
|
||||
##
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
[project]
|
||||
name = "fediblockhole"
|
||||
version = "0.4.2"
|
||||
version = "0.4.4"
|
||||
description = "Federated blocklist management for Mastodon"
|
||||
readme = "README.md"
|
||||
license = {file = "LICENSE"}
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.6"
|
||||
keywords = ["mastodon", "fediblock"]
|
||||
authors = [
|
||||
{name = "Justin Warren"}, {email = "justin@eigenmagic.com"}
|
||||
|
@ -17,6 +17,10 @@ classifiers = [
|
|||
"Natural Language :: English",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.10",
|
||||
"Programming Language :: Python :: 3.9",
|
||||
"Programming Language :: Python :: 3.8",
|
||||
"Programming Language :: Python :: 3.7",
|
||||
"Programming Language :: Python :: 3.6",
|
||||
]
|
||||
dependencies = [
|
||||
"requests",
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
"domain","severity","private_comment","public_comment","reject_media","reject_reports","obfuscate"
|
||||
"eigenmagic.net","noop","Never block me","Only the domain field matters",False,False,False
|
||||
"example.org","noop","Never block me either","The severity is ignored as are all other fields",False,False,False
|
||||
"eigenmagic.net","noop","Never block me","Only the domain field matters for allowlists",False,False,False
|
||||
"example.org","noop","Never block me either","The severity is ignored in allowlists as are all other fields",False,False,False
|
||||
"demo01.example.org","noop","Never block me either","But you can use them to leave yourself or others notes on why the item is here",False,False,False
|
||||
|
|
|
|
@ -1,6 +1,6 @@
|
|||
"""A tool for managing federated Mastodon blocklists
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import toml
|
||||
import csv
|
||||
|
@ -11,7 +11,7 @@ import os.path
|
|||
import sys
|
||||
import urllib.request as urlr
|
||||
|
||||
from .blocklist_parser import parse_blocklist
|
||||
from .blocklists import Blocklist, parse_blocklist
|
||||
from .const import DomainBlock, BlockSeverity
|
||||
|
||||
from importlib.metadata import version
|
||||
|
@ -59,19 +59,19 @@ def sync_blocklists(conf: argparse.Namespace):
|
|||
# Add extra export fields if defined in config
|
||||
export_fields.extend(conf.export_fields)
|
||||
|
||||
blocklists = {}
|
||||
blocklists = []
|
||||
# Fetch blocklists from URLs
|
||||
if not conf.no_fetch_url:
|
||||
blocklists = fetch_from_urls(blocklists, conf.blocklist_url_sources,
|
||||
import_fields, conf.save_intermediate, conf.savedir, export_fields)
|
||||
blocklists.extend(fetch_from_urls(conf.blocklist_url_sources,
|
||||
import_fields, conf.save_intermediate, conf.savedir, export_fields))
|
||||
|
||||
# Fetch blocklists from remote instances
|
||||
if not conf.no_fetch_instance:
|
||||
blocklists = fetch_from_instances(blocklists, conf.blocklist_instance_sources,
|
||||
import_fields, conf.save_intermediate, conf.savedir, export_fields)
|
||||
blocklists.extend(fetch_from_instances(conf.blocklist_instance_sources,
|
||||
import_fields, conf.save_intermediate, conf.savedir, export_fields))
|
||||
|
||||
# Merge blocklists into an update dict
|
||||
merged = merge_blocklists(blocklists, conf.mergeplan)
|
||||
merged = merge_blocklists(blocklists, conf.mergeplan, conf.merge_threshold, conf.merge_threshold_type)
|
||||
|
||||
# Remove items listed in allowlists, if any
|
||||
allowlists = fetch_allowlists(conf)
|
||||
|
@ -80,48 +80,48 @@ def sync_blocklists(conf: argparse.Namespace):
|
|||
# Save the final mergelist, if requested
|
||||
if conf.blocklist_savefile:
|
||||
log.info(f"Saving merged blocklist to {conf.blocklist_savefile}")
|
||||
save_blocklist_to_file(merged.values(), conf.blocklist_savefile, export_fields)
|
||||
save_blocklist_to_file(merged, conf.blocklist_savefile, export_fields)
|
||||
|
||||
# Push the blocklist to destination instances
|
||||
if not conf.no_push_instance:
|
||||
log.info("Pushing domain blocks to instances...")
|
||||
for dest in conf.blocklist_instance_destinations:
|
||||
domain = dest['domain']
|
||||
target = dest['domain']
|
||||
token = dest['token']
|
||||
scheme = dest.get('scheme', 'https')
|
||||
max_followed_severity = BlockSeverity(dest.get('max_followed_severity', 'silence'))
|
||||
push_blocklist(token, domain, merged.values(), conf.dryrun, import_fields, max_followed_severity, scheme)
|
||||
push_blocklist(token, target, merged, conf.dryrun, import_fields, max_followed_severity, scheme)
|
||||
|
||||
def apply_allowlists(merged: dict, conf: argparse.Namespace, allowlists: dict):
|
||||
def apply_allowlists(merged: Blocklist, conf: argparse.Namespace, allowlists: dict):
|
||||
"""Apply allowlists
|
||||
"""
|
||||
# Apply allows specified on the commandline
|
||||
for domain in conf.allow_domains:
|
||||
log.info(f"'{domain}' allowed by commandline, removing any blocks...")
|
||||
if domain in merged:
|
||||
del merged[domain]
|
||||
if domain in merged.blocks:
|
||||
del merged.blocks[domain]
|
||||
|
||||
# Apply allows from URLs lists
|
||||
log.info("Removing domains from URL allowlists...")
|
||||
for key, alist in allowlists.items():
|
||||
log.debug(f"Processing allows from '{key}'...")
|
||||
for allowed in alist:
|
||||
for alist in allowlists:
|
||||
log.debug(f"Processing allows from '{alist.origin}'...")
|
||||
for allowed in alist.blocks.values():
|
||||
domain = allowed.domain
|
||||
log.debug(f"Removing allowlisted domain '{domain}' from merged list.")
|
||||
if domain in merged:
|
||||
del merged[domain]
|
||||
if domain in merged.blocks:
|
||||
del merged.blocks[domain]
|
||||
|
||||
return merged
|
||||
|
||||
def fetch_allowlists(conf: argparse.Namespace) -> dict:
|
||||
def fetch_allowlists(conf: argparse.Namespace) -> Blocklist:
|
||||
"""
|
||||
"""
|
||||
if conf.allowlist_url_sources:
|
||||
allowlists = fetch_from_urls({}, conf.allowlist_url_sources, ALLOWLIST_IMPORT_FIELDS)
|
||||
allowlists = fetch_from_urls(conf.allowlist_url_sources, ALLOWLIST_IMPORT_FIELDS, conf.save_intermediate, conf.savedir)
|
||||
return allowlists
|
||||
return {}
|
||||
return Blocklist()
|
||||
|
||||
def fetch_from_urls(blocklists: dict, url_sources: dict,
|
||||
def fetch_from_urls(url_sources: dict,
|
||||
import_fields: list=IMPORT_FIELDS,
|
||||
save_intermediate: bool=False,
|
||||
savedir: str=None, export_fields: list=EXPORT_FIELDS) -> dict:
|
||||
|
@ -131,7 +131,7 @@ def fetch_from_urls(blocklists: dict, url_sources: dict,
|
|||
@returns: A dict of blocklists, same as input, but (possibly) modified
|
||||
"""
|
||||
log.info("Fetching domain blocks from URLs...")
|
||||
|
||||
blocklists = []
|
||||
for item in url_sources:
|
||||
url = item['url']
|
||||
# If import fields are provided, they override the global ones passed in
|
||||
|
@ -144,14 +144,14 @@ def fetch_from_urls(blocklists: dict, url_sources: dict,
|
|||
listformat = item.get('format', 'csv')
|
||||
with urlr.urlopen(url) as fp:
|
||||
rawdata = fp.read(URL_BLOCKLIST_MAXSIZE).decode('utf-8')
|
||||
blocklists[url] = parse_blocklist(rawdata, listformat, import_fields, max_severity)
|
||||
|
||||
if save_intermediate:
|
||||
save_intermediate_blocklist(blocklists[url], url, savedir, export_fields)
|
||||
bl = parse_blocklist(rawdata, url, listformat, import_fields, max_severity)
|
||||
blocklists.append(bl)
|
||||
if save_intermediate:
|
||||
save_intermediate_blocklist(bl, savedir, export_fields)
|
||||
|
||||
return blocklists
|
||||
|
||||
def fetch_from_instances(blocklists: dict, sources: dict,
|
||||
def fetch_from_instances(sources: dict,
|
||||
import_fields: list=IMPORT_FIELDS,
|
||||
save_intermediate: bool=False,
|
||||
savedir: str=None, export_fields: list=EXPORT_FIELDS) -> dict:
|
||||
|
@ -161,12 +161,13 @@ def fetch_from_instances(blocklists: dict, sources: dict,
|
|||
@returns: A dict of blocklists, same as input, but (possibly) modified
|
||||
"""
|
||||
log.info("Fetching domain blocks from instances...")
|
||||
blocklists = []
|
||||
for item in sources:
|
||||
domain = item['domain']
|
||||
admin = item.get('admin', False)
|
||||
token = item.get('token', None)
|
||||
scheme = item.get('scheme', 'https')
|
||||
itemsrc = f"{scheme}://{domain}/api"
|
||||
# itemsrc = f"{scheme}://{domain}/api"
|
||||
|
||||
# If import fields are provided, they override the global ones passed in
|
||||
source_import_fields = item.get('import_fields', None)
|
||||
|
@ -174,45 +175,69 @@ def fetch_from_instances(blocklists: dict, sources: dict,
|
|||
# Ensure we always use the default fields
|
||||
import_fields = IMPORT_FIELDS.extend(source_import_fields)
|
||||
|
||||
# Add the blocklist with the domain as the source key
|
||||
blocklists[itemsrc] = fetch_instance_blocklist(domain, token, admin, import_fields, scheme)
|
||||
bl = fetch_instance_blocklist(domain, token, admin, import_fields, scheme)
|
||||
blocklists.append(bl)
|
||||
if save_intermediate:
|
||||
save_intermediate_blocklist(blocklists[itemsrc], domain, savedir, export_fields)
|
||||
save_intermediate_blocklist(bl, savedir, export_fields)
|
||||
return blocklists
|
||||
|
||||
def merge_blocklists(blocklists: dict, mergeplan: str='max') -> dict:
|
||||
def merge_blocklists(blocklists: list[Blocklist], mergeplan: str='max',
|
||||
threshold: int=0,
|
||||
threshold_type: str='count') -> Blocklist:
|
||||
"""Merge fetched remote blocklists into a bulk update
|
||||
@param blocklists: A dict of lists of DomainBlocks, keyed by source.
|
||||
Each value is a list of DomainBlocks
|
||||
@param mergeplan: An optional method of merging overlapping block definitions
|
||||
'max' (the default) uses the highest severity block found
|
||||
'min' uses the lowest severity block found
|
||||
@param threshold: An integer used in the threshold mechanism.
|
||||
If a domain is not present in this number/pct or more of the blocklists,
|
||||
it will not get merged into the final list.
|
||||
@param threshold_type: choice of ['count', 'pct']
|
||||
If `count`, threshold is met if block is present in `threshold`
|
||||
or more blocklists.
|
||||
If `pct`, theshold is met if block is present in
|
||||
count_of_mentions / number_of_blocklists.
|
||||
@param returns: A dict of DomainBlocks keyed by domain
|
||||
"""
|
||||
merged = {}
|
||||
merged = Blocklist('fediblockhole.merge_blocklists')
|
||||
|
||||
for key, blist in blocklists.items():
|
||||
log.debug(f"processing blocklist from: {key} ...")
|
||||
for newblock in blist:
|
||||
domain = newblock.domain
|
||||
# If the domain has two asterisks in it, it's obfuscated
|
||||
# and we can't really use it, so skip it and do the next one
|
||||
if '*' in domain:
|
||||
log.debug(f"Domain '{domain}' is obfuscated. Skipping it.")
|
||||
num_blocklists = len(blocklists)
|
||||
|
||||
# Create a domain keyed list of blocks for each domain
|
||||
domain_blocks = {}
|
||||
|
||||
for bl in blocklists:
|
||||
for block in bl.values():
|
||||
if '*' in block.domain:
|
||||
log.debug(f"Domain '{block.domain}' is obfuscated. Skipping it.")
|
||||
continue
|
||||
|
||||
elif domain in merged:
|
||||
log.debug(f"Overlapping block for domain {domain}. Merging...")
|
||||
blockdata = apply_mergeplan(merged[domain], newblock, mergeplan)
|
||||
|
||||
elif block.domain in domain_blocks:
|
||||
domain_blocks[block.domain].append(block)
|
||||
else:
|
||||
# New block
|
||||
blockdata = newblock
|
||||
domain_blocks[block.domain] = [block,]
|
||||
|
||||
# Only merge items if `threshold` is met or exceeded
|
||||
for domain in domain_blocks:
|
||||
if threshold_type == 'count':
|
||||
domain_threshold_level = len(domain_blocks[domain])
|
||||
elif threshold_type == 'pct':
|
||||
domain_threshold_level = len(domain_blocks[domain]) / num_blocklists * 100
|
||||
# log.debug(f"domain threshold level: {domain_threshold_level}")
|
||||
else:
|
||||
raise ValueError(f"Unsupported threshold type '{threshold_type}'. Supported values are: 'count', 'pct'")
|
||||
|
||||
log.debug(f"Checking if {domain_threshold_level} >= {threshold} for {domain}")
|
||||
if domain_threshold_level >= threshold:
|
||||
# Add first block in the list to merged
|
||||
block = domain_blocks[domain][0]
|
||||
log.debug(f"Yes. Merging block: {block}")
|
||||
|
||||
# Merge the others with this record
|
||||
for newblock in domain_blocks[domain][1:]:
|
||||
block = apply_mergeplan(block, newblock, mergeplan)
|
||||
merged.blocks[block.domain] = block
|
||||
|
||||
# end if
|
||||
log.debug(f"blockdata is: {blockdata}")
|
||||
merged[domain] = blockdata
|
||||
# end for
|
||||
return merged
|
||||
|
||||
def apply_mergeplan(oldblock: DomainBlock, newblock: DomainBlock, mergeplan: str='max') -> dict:
|
||||
|
@ -239,10 +264,10 @@ def apply_mergeplan(oldblock: DomainBlock, newblock: DomainBlock, mergeplan: str
|
|||
# How do we override an earlier block definition?
|
||||
if mergeplan in ['max', None]:
|
||||
# Use the highest block level found (the default)
|
||||
log.debug(f"Using 'max' mergeplan.")
|
||||
# log.debug(f"Using 'max' mergeplan.")
|
||||
|
||||
if newblock.severity > oldblock.severity:
|
||||
log.debug(f"New block severity is higher. Using that.")
|
||||
# log.debug(f"New block severity is higher. Using that.")
|
||||
blockdata['severity'] = newblock.severity
|
||||
|
||||
# For 'reject_media', 'reject_reports', and 'obfuscate' if
|
||||
|
@ -271,7 +296,7 @@ def apply_mergeplan(oldblock: DomainBlock, newblock: DomainBlock, mergeplan: str
|
|||
else:
|
||||
raise NotImplementedError(f"Mergeplan '{mergeplan}' not implemented.")
|
||||
|
||||
log.debug(f"Block severity set to {blockdata['severity']}")
|
||||
# log.debug(f"Block severity set to {blockdata['severity']}")
|
||||
|
||||
return DomainBlock(**blockdata)
|
||||
|
||||
|
@ -357,17 +382,19 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False,
|
|||
|
||||
url = f"{scheme}://{host}{api_path}"
|
||||
|
||||
blocklist = []
|
||||
blockdata = []
|
||||
link = True
|
||||
|
||||
while link:
|
||||
response = requests.get(url, headers=headers, timeout=REQUEST_TIMEOUT)
|
||||
if response.status_code != 200:
|
||||
log.error(f"Cannot fetch remote blocklist: {response.content}")
|
||||
raise ValueError("Unable to fetch domain block list: %s", response)
|
||||
|
||||
blocklist.extend( parse_blocklist(response.content, parse_format, import_fields) )
|
||||
|
||||
# Each block of returned data is a JSON list of dicts
|
||||
# so we parse them and append them to the fetched list
|
||||
# of JSON data we need to parse.
|
||||
|
||||
blockdata.extend(json.loads(response.content.decode('utf-8')))
|
||||
# Parse the link header to find the next url to fetch
|
||||
# This is a weird and janky way of doing pagination but
|
||||
# hey nothing we can do about it we just have to deal
|
||||
|
@ -385,6 +412,8 @@ def fetch_instance_blocklist(host: str, token: str=None, admin: bool=False,
|
|||
urlstring, rel = next.split('; ')
|
||||
url = urlstring.strip('<').rstrip('>')
|
||||
|
||||
blocklist = parse_blocklist(blockdata, url, parse_format, import_fields)
|
||||
|
||||
return blocklist
|
||||
|
||||
def delete_block(token: str, host: str, id: int, scheme: str='https'):
|
||||
|
@ -474,13 +503,9 @@ def update_known_block(token: str, host: str, block: DomainBlock, scheme: str='h
|
|||
"""Update an existing domain block with information in blockdict"""
|
||||
api_path = "/api/v1/admin/domain_blocks/"
|
||||
|
||||
try:
|
||||
id = block.id
|
||||
blockdata = block._asdict()
|
||||
del blockdata['id']
|
||||
except KeyError:
|
||||
import pdb
|
||||
pdb.set_trace()
|
||||
id = block.id
|
||||
blockdata = block._asdict()
|
||||
del blockdata['id']
|
||||
|
||||
url = f"{scheme}://{host}{api_path}{id}"
|
||||
|
||||
|
@ -514,7 +539,7 @@ def add_block(token: str, host: str, blockdata: DomainBlock, scheme: str='https'
|
|||
|
||||
raise ValueError(f"Something went wrong: {response.status_code}: {response.content}")
|
||||
|
||||
def push_blocklist(token: str, host: str, blocklist: list[dict],
|
||||
def push_blocklist(token: str, host: str, blocklist: list[DomainBlock],
|
||||
dryrun: bool=False,
|
||||
import_fields: list=['domain', 'severity'],
|
||||
max_followed_severity:BlockSeverity=BlockSeverity('silence'),
|
||||
|
@ -522,8 +547,7 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
|
|||
):
|
||||
"""Push a blocklist to a remote instance.
|
||||
|
||||
Merging the blocklist with the existing list the instance has,
|
||||
updating existing entries if they exist.
|
||||
Updates existing entries if they exist, creates new blocks if they don't.
|
||||
|
||||
@param token: The Bearer token for OAUTH API authentication
|
||||
@param host: The instance host, FQDN or IP
|
||||
|
@ -538,15 +562,16 @@ def push_blocklist(token: str, host: str, blocklist: list[dict],
|
|||
serverblocks = fetch_instance_blocklist(host, token, True, import_fields, scheme)
|
||||
|
||||
# # Convert serverblocks to a dictionary keyed by domain name
|
||||
knownblocks = {row.domain: row for row in serverblocks}
|
||||
# knownblocks = {row.domain: row for row in serverblocks}
|
||||
|
||||
for newblock in blocklist:
|
||||
for newblock in blocklist.values():
|
||||
|
||||
log.debug(f"Processing block: {newblock}")
|
||||
oldblock = knownblocks.get(newblock.domain, None)
|
||||
if oldblock:
|
||||
if newblock.domain in serverblocks:
|
||||
log.debug(f"Block already exists for {newblock.domain}, checking for differences...")
|
||||
|
||||
oldblock = serverblocks[newblock.domain]
|
||||
|
||||
change_needed = is_change_needed(oldblock, newblock, import_fields)
|
||||
|
||||
# Is the severity changing?
|
||||
|
@ -605,15 +630,14 @@ def load_config(configfile: str):
|
|||
conf = toml.load(configfile)
|
||||
return conf
|
||||
|
||||
def save_intermediate_blocklist(
|
||||
blocklist: list[dict], source: str,
|
||||
filedir: str,
|
||||
def save_intermediate_blocklist(blocklist: Blocklist, filedir: str,
|
||||
export_fields: list=['domain','severity']):
|
||||
"""Save a local copy of a blocklist we've downloaded
|
||||
"""
|
||||
# Invent a filename based on the remote source
|
||||
# If the source was a URL, convert it to something less messy
|
||||
# If the source was a remote domain, just use the name of the domain
|
||||
source = blocklist.origin
|
||||
log.debug(f"Saving intermediate blocklist from {source}")
|
||||
source = source.replace('/','-')
|
||||
filename = f"{source}.csv"
|
||||
|
@ -621,7 +645,7 @@ def save_intermediate_blocklist(
|
|||
save_blocklist_to_file(blocklist, filepath, export_fields)
|
||||
|
||||
def save_blocklist_to_file(
|
||||
blocklist: list[DomainBlock],
|
||||
blocklist: Blocklist,
|
||||
filepath: str,
|
||||
export_fields: list=['domain','severity']):
|
||||
"""Save a blocklist we've downloaded from a remote source
|
||||
|
@ -631,18 +655,22 @@ def save_blocklist_to_file(
|
|||
@param export_fields: Which fields to include in the export.
|
||||
"""
|
||||
try:
|
||||
blocklist = sorted(blocklist, key=lambda x: x.domain)
|
||||
sorted_list = sorted(blocklist.blocks.items())
|
||||
except KeyError:
|
||||
log.error("Field 'domain' not found in blocklist.")
|
||||
log.debug(f"blocklist is: {blocklist}")
|
||||
log.debug(f"blocklist is: {sorted_list}")
|
||||
except AttributeError:
|
||||
log.error("Attribute error!")
|
||||
import pdb
|
||||
pdb.set_trace()
|
||||
|
||||
log.debug(f"export fields: {export_fields}")
|
||||
|
||||
with open(filepath, "w") as fp:
|
||||
writer = csv.DictWriter(fp, export_fields, extrasaction='ignore')
|
||||
writer.writeheader()
|
||||
for item in blocklist:
|
||||
writer.writerow(item._asdict())
|
||||
for key, value in sorted_list:
|
||||
writer.writerow(value)
|
||||
|
||||
def augment_args(args, tomldata: str=None):
|
||||
"""Augment commandline arguments with config file parameters
|
||||
|
@ -682,6 +710,12 @@ def augment_args(args, tomldata: str=None):
|
|||
if not args.mergeplan:
|
||||
args.mergeplan = conf.get('mergeplan', 'max')
|
||||
|
||||
if not args.merge_threshold:
|
||||
args.merge_threshold = conf.get('merge_threshold', 0)
|
||||
|
||||
if not args.merge_threshold_type:
|
||||
args.merge_threshold_type = conf.get('merge_threshold_type', 'count')
|
||||
|
||||
args.blocklist_url_sources = conf.get('blocklist_url_sources', [])
|
||||
args.blocklist_instance_sources = conf.get('blocklist_instance_sources', [])
|
||||
args.allowlist_url_sources = conf.get('allowlist_url_sources', [])
|
||||
|
@ -703,6 +737,8 @@ def setup_argparse():
|
|||
ap.add_argument('-S', '--save-intermediate', dest="save_intermediate", action='store_true', help="Save intermediate blocklists we fetch to local files.")
|
||||
ap.add_argument('-D', '--savedir', dest="savedir", help="Directory path to save intermediate lists.")
|
||||
ap.add_argument('-m', '--mergeplan', choices=['min', 'max'], help="Set mergeplan.")
|
||||
ap.add_argument('--merge-threshold', type=int, help="Merge threshold value")
|
||||
ap.add_argument('--merge-threshold-type', choices=['count', 'pct'], help="Type of merge threshold to use.")
|
||||
|
||||
ap.add_argument('-I', '--import-field', dest='import_fields', action='append', help="Extra blocklist fields to import.")
|
||||
ap.add_argument('-E', '--export-field', dest='export_fields', action='append', help="Extra blocklist fields to export.")
|
||||
|
|
|
@ -1,19 +1,48 @@
|
|||
"""Parse various blocklist data formats
|
||||
"""
|
||||
from typing import Iterable
|
||||
from .const import DomainBlock, BlockSeverity
|
||||
|
||||
from __future__ import annotations
|
||||
import csv
|
||||
import json
|
||||
from typing import Iterable
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from .const import DomainBlock, BlockSeverity
|
||||
|
||||
import logging
|
||||
log = logging.getLogger('fediblockhole')
|
||||
|
||||
@dataclass
|
||||
class Blocklist:
|
||||
""" A Blocklist object
|
||||
|
||||
A Blocklist is a list of DomainBlocks from an origin
|
||||
"""
|
||||
origin: str = None
|
||||
blocks: dict[str, DomainBlock] = field(default_factory=dict)
|
||||
|
||||
def __len__(self):
|
||||
return len(self.blocks)
|
||||
|
||||
def __class_getitem__(cls, item):
|
||||
return dict[str, DomainBlock]
|
||||
|
||||
def __getitem__(self, item):
|
||||
return self.blocks[item]
|
||||
|
||||
def __iter__(self):
|
||||
return self.blocks.__iter__()
|
||||
|
||||
def items(self):
|
||||
return self.blocks.items()
|
||||
|
||||
def values(self):
|
||||
return self.blocks.values()
|
||||
|
||||
class BlocklistParser(object):
|
||||
"""
|
||||
Base class for parsing blocklists
|
||||
"""
|
||||
preparse = False
|
||||
do_preparse = False
|
||||
|
||||
def __init__(self, import_fields: list=['domain', 'severity'],
|
||||
max_severity: str='suspend'):
|
||||
|
@ -30,17 +59,18 @@ class BlocklistParser(object):
|
|||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
def parse_blocklist(self, blockdata) -> dict[DomainBlock]:
|
||||
def parse_blocklist(self, blockdata, origin:str=None) -> Blocklist:
|
||||
"""Parse an iterable of blocklist items
|
||||
@param blocklist: An Iterable of blocklist items
|
||||
@returns: A dict of DomainBlocks, keyed by domain
|
||||
"""
|
||||
if self.preparse:
|
||||
if self.do_preparse:
|
||||
blockdata = self.preparse(blockdata)
|
||||
|
||||
parsed_list = []
|
||||
parsed_list = Blocklist(origin)
|
||||
for blockitem in blockdata:
|
||||
parsed_list.append(self.parse_item(blockitem))
|
||||
block = self.parse_item(blockitem)
|
||||
parsed_list.blocks[block.domain] = block
|
||||
return parsed_list
|
||||
|
||||
def parse_item(self, blockitem) -> DomainBlock:
|
||||
|
@ -53,12 +83,13 @@ class BlocklistParser(object):
|
|||
|
||||
class BlocklistParserJSON(BlocklistParser):
|
||||
"""Parse a JSON formatted blocklist"""
|
||||
preparse = True
|
||||
do_preparse = True
|
||||
|
||||
def preparse(self, blockdata) -> Iterable:
|
||||
"""Parse the blockdata as JSON
|
||||
"""
|
||||
return json.loads(blockdata)
|
||||
"""Parse the blockdata as JSON if needed"""
|
||||
if type(blockdata) == type(''):
|
||||
return json.loads(blockdata)
|
||||
return blockdata
|
||||
|
||||
def parse_item(self, blockitem: dict) -> DomainBlock:
|
||||
# Remove fields we don't want to import
|
||||
|
@ -102,7 +133,7 @@ class BlocklistParserCSV(BlocklistParser):
|
|||
|
||||
The parser expects the CSV data to include a header with the field names.
|
||||
"""
|
||||
preparse = True
|
||||
do_preparse = True
|
||||
|
||||
def preparse(self, blockdata) -> Iterable:
|
||||
"""Use a csv.DictReader to create an iterable from the blockdata
|
||||
|
@ -130,6 +161,24 @@ class BlocklistParserCSV(BlocklistParser):
|
|||
block.severity = self.max_severity
|
||||
return block
|
||||
|
||||
class BlocklistParserMastodonCSV(BlocklistParserCSV):
|
||||
""" Parse Mastodon CSV formatted blocklists
|
||||
|
||||
The Mastodon v4.1.x domain block CSV export prefixes its
|
||||
field names with a '#' character because… reasons?
|
||||
"""
|
||||
do_preparse = True
|
||||
|
||||
def parse_item(self, blockitem: dict) -> DomainBlock:
|
||||
"""Build a new blockitem dict with new un-#ed keys
|
||||
"""
|
||||
newdict = {}
|
||||
for key in blockitem:
|
||||
newkey = key.lstrip('#')
|
||||
newdict[newkey] = blockitem[key]
|
||||
|
||||
return super().parse_item(newdict)
|
||||
|
||||
class RapidBlockParserCSV(BlocklistParserCSV):
|
||||
""" Parse RapidBlock CSV blocklists
|
||||
|
||||
|
@ -193,6 +242,7 @@ def str2bool(boolstring: str) -> bool:
|
|||
|
||||
FORMAT_PARSERS = {
|
||||
'csv': BlocklistParserCSV,
|
||||
'mastodon_csv': BlocklistParserMastodonCSV,
|
||||
'json': BlocklistParserJSON,
|
||||
'mastodon_api_public': BlocklistParserMastodonAPIPublic,
|
||||
'rapidblock.csv': RapidBlockParserCSV,
|
||||
|
@ -202,11 +252,13 @@ FORMAT_PARSERS = {
|
|||
# helper function to select the appropriate Parser
|
||||
def parse_blocklist(
|
||||
blockdata,
|
||||
origin,
|
||||
format="csv",
|
||||
import_fields: list=['domain', 'severity'],
|
||||
max_severity: str='suspend'):
|
||||
"""Parse a blocklist in the given format
|
||||
"""
|
||||
parser = FORMAT_PARSERS[format](import_fields, max_severity)
|
||||
log.debug(f"parsing {format} blocklist with import_fields: {import_fields}...")
|
||||
return parser.parse_blocklist(blockdata)
|
||||
|
||||
parser = FORMAT_PARSERS[format](import_fields, max_severity)
|
||||
return parser.parse_blocklist(blockdata, origin)
|
|
@ -1,5 +1,6 @@
|
|||
""" Constant objects used by FediBlockHole
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import enum
|
||||
from typing import NamedTuple, Optional, TypedDict
|
||||
from dataclasses import dataclass
|
||||
|
|
|
@ -7,5 +7,6 @@ def shim_argparse(testargv: list=[], tomldata: str=None):
|
|||
"""
|
||||
ap = setup_argparse()
|
||||
args = ap.parse_args(testargv)
|
||||
args = augment_args(args, tomldata)
|
||||
if tomldata is not None:
|
||||
args = augment_args(args, tomldata)
|
||||
return args
|
|
@ -4,6 +4,7 @@ import pytest
|
|||
|
||||
from util import shim_argparse
|
||||
from fediblockhole.const import DomainBlock
|
||||
from fediblockhole.blocklists import Blocklist
|
||||
from fediblockhole import fetch_allowlists, apply_allowlists
|
||||
|
||||
def test_cmdline_allow_removes_domain():
|
||||
|
@ -11,17 +12,13 @@ def test_cmdline_allow_removes_domain():
|
|||
"""
|
||||
conf = shim_argparse(['-A', 'removeme.org'])
|
||||
|
||||
merged = {
|
||||
merged = Blocklist('test_allowlist.merged', {
|
||||
'example.org': DomainBlock('example.org'),
|
||||
'example2.org': DomainBlock('example2.org'),
|
||||
'removeme.org': DomainBlock('removeme.org'),
|
||||
'keepblockingme.org': DomainBlock('keepblockingme.org'),
|
||||
}
|
||||
})
|
||||
|
||||
# allowlists = {
|
||||
# 'testlist': [ DomainBlock('removeme.org', 'noop'), ]
|
||||
# }
|
||||
|
||||
merged = apply_allowlists(merged, conf, {})
|
||||
|
||||
with pytest.raises(KeyError):
|
||||
|
@ -32,16 +29,18 @@ def test_allowlist_removes_domain():
|
|||
"""
|
||||
conf = shim_argparse()
|
||||
|
||||
merged = {
|
||||
merged = Blocklist('test_allowlist.merged', {
|
||||
'example.org': DomainBlock('example.org'),
|
||||
'example2.org': DomainBlock('example2.org'),
|
||||
'removeme.org': DomainBlock('removeme.org'),
|
||||
'keepblockingme.org': DomainBlock('keepblockingme.org'),
|
||||
}
|
||||
})
|
||||
|
||||
allowlists = {
|
||||
'testlist': [ DomainBlock('removeme.org', 'noop'), ]
|
||||
}
|
||||
allowlists = [
|
||||
Blocklist('test_allowlist', {
|
||||
'removeme.org': DomainBlock('removeme.org', 'noop'),
|
||||
})
|
||||
]
|
||||
|
||||
merged = apply_allowlists(merged, conf, allowlists)
|
||||
|
||||
|
@ -53,19 +52,19 @@ def test_allowlist_removes_tld():
|
|||
"""
|
||||
conf = shim_argparse()
|
||||
|
||||
merged = {
|
||||
merged = Blocklist('test_allowlist.merged', {
|
||||
'.cf': DomainBlock('.cf'),
|
||||
'example.org': DomainBlock('example.org'),
|
||||
'.tk': DomainBlock('.tk'),
|
||||
'keepblockingme.org': DomainBlock('keepblockingme.org'),
|
||||
}
|
||||
})
|
||||
|
||||
allowlists = {
|
||||
'list1': [
|
||||
DomainBlock('.cf', 'noop'),
|
||||
DomainBlock('.tk', 'noop'),
|
||||
]
|
||||
}
|
||||
allowlists = [
|
||||
Blocklist('test_allowlist.list1', {
|
||||
'.cf': DomainBlock('.cf', 'noop'),
|
||||
'.tk': DomainBlock('.tk', 'noop'),
|
||||
})
|
||||
]
|
||||
|
||||
merged = apply_allowlists(merged, conf, allowlists)
|
||||
|
||||
|
|
|
@ -49,3 +49,33 @@ allowlist_url_sources = [ { url='file:///path/to/allowlist', format='csv'} ]
|
|||
'url': 'file:///path/to/allowlist',
|
||||
'format': 'csv',
|
||||
}]
|
||||
|
||||
def test_set_merge_thresold_default():
|
||||
tomldata = """
|
||||
"""
|
||||
args = shim_argparse([], tomldata)
|
||||
|
||||
assert args.mergeplan == 'max'
|
||||
assert args.merge_threshold_type == 'count'
|
||||
|
||||
def test_set_merge_thresold_count():
|
||||
tomldata = """# Add a merge threshold
|
||||
merge_threshold_type = 'count'
|
||||
merge_threshold = 2
|
||||
"""
|
||||
args = shim_argparse([], tomldata)
|
||||
|
||||
assert args.mergeplan == 'max'
|
||||
assert args.merge_threshold_type == 'count'
|
||||
assert args.merge_threshold == 2
|
||||
|
||||
def test_set_merge_thresold_pct():
|
||||
tomldata = """# Add a merge threshold
|
||||
merge_threshold_type = 'pct'
|
||||
merge_threshold = 35
|
||||
"""
|
||||
args = shim_argparse([], tomldata)
|
||||
|
||||
assert args.mergeplan == 'max'
|
||||
assert args.merge_threshold_type == 'pct'
|
||||
assert args.merge_threshold == 35
|
||||
|
|
|
@ -0,0 +1,153 @@
|
|||
"""Test merge with thresholds
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklists import Blocklist, parse_blocklist
|
||||
from fediblockhole import merge_blocklists, apply_mergeplan
|
||||
|
||||
from fediblockhole.const import SeverityLevel, DomainBlock
|
||||
|
||||
datafile01 = "data-suspends-01.csv"
|
||||
datafile02 = "data-silences-01.csv"
|
||||
datafile03 = "data-noop-01.csv"
|
||||
|
||||
import_fields = [
|
||||
'domain',
|
||||
'severity',
|
||||
'public_comment',
|
||||
'private_comment',
|
||||
'reject_media',
|
||||
'reject_reports',
|
||||
'obfuscate'
|
||||
]
|
||||
|
||||
def load_test_blocklist_data(datafiles):
|
||||
|
||||
blocklists = []
|
||||
|
||||
for df in datafiles:
|
||||
with open(df) as fp:
|
||||
data = fp.read()
|
||||
bl = parse_blocklist(data, df, 'csv', import_fields)
|
||||
blocklists.append(bl)
|
||||
|
||||
return blocklists
|
||||
|
||||
def test_mergeplan_count_2():
|
||||
"""Only merge a block if present in 2 or more lists
|
||||
"""
|
||||
|
||||
bl_1 = Blocklist('test01', {
|
||||
'onemention.example.org': DomainBlock('onemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_2 = Blocklist('test2', {
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_3 = Blocklist('test3', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
ml = merge_blocklists([bl_1, bl_2, bl_3], 'max', threshold=2)
|
||||
|
||||
assert 'onemention.example.org' not in ml
|
||||
assert 'twomention.example.org' in ml
|
||||
assert 'threemention.example.org' in ml
|
||||
|
||||
def test_mergeplan_count_3():
|
||||
"""Only merge a block if present in 3 or more lists
|
||||
"""
|
||||
|
||||
bl_1 = Blocklist('test01', {
|
||||
'onemention.example.org': DomainBlock('onemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_2 = Blocklist('test2', {
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_3 = Blocklist('test3', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
ml = merge_blocklists([bl_1, bl_2, bl_3], 'max', threshold=3)
|
||||
|
||||
assert 'onemention.example.org' not in ml
|
||||
assert 'twomention.example.org' not in ml
|
||||
assert 'threemention.example.org' in ml
|
||||
|
||||
def test_mergeplan_pct_30():
|
||||
"""Only merge a block if present in 2 or more lists
|
||||
"""
|
||||
|
||||
bl_1 = Blocklist('test01', {
|
||||
'onemention.example.org': DomainBlock('onemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
|
||||
})
|
||||
|
||||
bl_2 = Blocklist('test2', {
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_3 = Blocklist('test3', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_4 = Blocklist('test4', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
ml = merge_blocklists([bl_1, bl_2, bl_3, bl_4], 'max', threshold=30, threshold_type='pct')
|
||||
|
||||
assert 'onemention.example.org' not in ml
|
||||
assert 'twomention.example.org' in ml
|
||||
assert 'threemention.example.org' in ml
|
||||
assert 'fourmention.example.org' in ml
|
||||
|
||||
def test_mergeplan_pct_55():
|
||||
"""Only merge a block if present in 2 or more lists
|
||||
"""
|
||||
|
||||
bl_1 = Blocklist('test01', {
|
||||
'onemention.example.org': DomainBlock('onemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
|
||||
})
|
||||
|
||||
bl_2 = Blocklist('test2', {
|
||||
'twomention.example.org': DomainBlock('twomention.example.org', 'suspend', '', '', True, True, True),
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_3 = Blocklist('test3', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
bl_4 = Blocklist('test4', {
|
||||
'threemention.example.org': DomainBlock('threemention.example.org', 'suspend', '', '', True, True, True),
|
||||
'fourmention.example.org': DomainBlock('fourmention.example.org', 'suspend', '', '', True, True, True),
|
||||
})
|
||||
|
||||
ml = merge_blocklists([bl_1, bl_2, bl_3, bl_4], 'max', threshold=55, threshold_type='pct')
|
||||
|
||||
assert 'onemention.example.org' not in ml
|
||||
assert 'twomention.example.org' not in ml
|
||||
assert 'threemention.example.org' in ml
|
||||
assert 'fourmention.example.org' in ml
|
|
@ -1,7 +1,7 @@
|
|||
"""Various mergeplan tests
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklist_parser import parse_blocklist
|
||||
from fediblockhole.blocklists import parse_blocklist
|
||||
from fediblockhole import merge_blocklists, merge_comments, apply_mergeplan
|
||||
|
||||
from fediblockhole.const import SeverityLevel, DomainBlock
|
||||
|
@ -22,20 +22,19 @@ import_fields = [
|
|||
|
||||
def load_test_blocklist_data(datafiles):
|
||||
|
||||
blocklists = {}
|
||||
blocklists = []
|
||||
|
||||
for df in datafiles:
|
||||
with open(df) as fp:
|
||||
data = fp.read()
|
||||
bl = parse_blocklist(data, 'csv', import_fields)
|
||||
blocklists[df] = bl
|
||||
bl = parse_blocklist(data, df, 'csv', import_fields)
|
||||
blocklists.append(bl)
|
||||
|
||||
return blocklists
|
||||
|
||||
def test_mergeplan_max():
|
||||
"""Test 'max' mergeplan"""
|
||||
blocklists = load_test_blocklist_data([datafile01, datafile02])
|
||||
|
||||
bl = merge_blocklists(blocklists, 'max')
|
||||
assert len(bl) == 13
|
||||
|
||||
|
|
|
@ -1,22 +1,24 @@
|
|||
"""Tests of the CSV parsing
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklist_parser import BlocklistParserCSV, parse_blocklist
|
||||
from fediblockhole.const import DomainBlock, BlockSeverity, SeverityLevel
|
||||
from fediblockhole.blocklists import BlocklistParserCSV, parse_blocklist
|
||||
from fediblockhole.const import SeverityLevel
|
||||
|
||||
|
||||
def test_single_line():
|
||||
csvdata = "example.org"
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserCSV()
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
assert len(bl) == 0
|
||||
|
||||
def test_header_only():
|
||||
csvdata = "domain,severity,public_comment"
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserCSV()
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
assert len(bl) == 0
|
||||
|
||||
def test_2_blocks():
|
||||
|
@ -24,12 +26,13 @@ def test_2_blocks():
|
|||
example.org,silence
|
||||
example2.org,suspend
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserCSV()
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 2
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert 'example.org' in bl
|
||||
|
||||
def test_4_blocks():
|
||||
csvdata = """domain,severity,public_comment
|
||||
|
@ -38,20 +41,21 @@ example2.org,suspend,"test 2"
|
|||
example3.org,noop,"test 3"
|
||||
example4.org,suspend,"test 4"
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserCSV()
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 4
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert bl[1].domain == 'example2.org'
|
||||
assert bl[2].domain == 'example3.org'
|
||||
assert bl[3].domain == 'example4.org'
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl[0].severity.level == SeverityLevel.SILENCE
|
||||
assert bl[1].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[2].severity.level == SeverityLevel.NONE
|
||||
assert bl[3].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['example.org'].severity.level == SeverityLevel.SILENCE
|
||||
assert bl['example2.org'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['example3.org'].severity.level == SeverityLevel.NONE
|
||||
assert bl['example4.org'].severity.level == SeverityLevel.SUSPEND
|
||||
|
||||
def test_ignore_comments():
|
||||
csvdata = """domain,severity,public_comment,private_comment
|
||||
|
@ -60,18 +64,18 @@ example2.org,suspend,"test 2","ignote me also"
|
|||
example3.org,noop,"test 3","and me"
|
||||
example4.org,suspend,"test 4","also me"
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserCSV()
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 4
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert bl[1].domain == 'example2.org'
|
||||
assert bl[2].domain == 'example3.org'
|
||||
assert bl[3].domain == 'example4.org'
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl[0].public_comment == ''
|
||||
assert bl[0].private_comment == ''
|
||||
|
||||
assert bl[2].public_comment == ''
|
||||
assert bl[2].private_comment == ''
|
||||
assert bl['example.org'].public_comment == ''
|
||||
assert bl['example.org'].private_comment == ''
|
||||
assert bl['example3.org'].public_comment == ''
|
||||
assert bl['example4.org'].private_comment == ''
|
|
@ -0,0 +1,81 @@
|
|||
"""Tests of the CSV parsing
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklists import BlocklistParserMastodonCSV
|
||||
from fediblockhole.const import SeverityLevel
|
||||
|
||||
|
||||
def test_single_line():
|
||||
csvdata = "example.org"
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserMastodonCSV()
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
assert len(bl) == 0
|
||||
|
||||
def test_header_only():
|
||||
csvdata = "#domain,#severity,#public_comment"
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserMastodonCSV()
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
assert len(bl) == 0
|
||||
|
||||
def test_2_blocks():
|
||||
csvdata = """domain,severity
|
||||
example.org,silence
|
||||
example2.org,suspend
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserMastodonCSV()
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 2
|
||||
assert 'example.org' in bl
|
||||
|
||||
def test_4_blocks():
|
||||
csvdata = """domain,severity,public_comment
|
||||
example.org,silence,"test 1"
|
||||
example2.org,suspend,"test 2"
|
||||
example3.org,noop,"test 3"
|
||||
example4.org,suspend,"test 4"
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserMastodonCSV()
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 4
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl['example.org'].severity.level == SeverityLevel.SILENCE
|
||||
assert bl['example2.org'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['example3.org'].severity.level == SeverityLevel.NONE
|
||||
assert bl['example4.org'].severity.level == SeverityLevel.SUSPEND
|
||||
|
||||
def test_ignore_comments():
|
||||
csvdata = """domain,severity,public_comment,private_comment
|
||||
example.org,silence,"test 1","ignore me"
|
||||
example2.org,suspend,"test 2","ignote me also"
|
||||
example3.org,noop,"test 3","and me"
|
||||
example4.org,suspend,"test 4","also me"
|
||||
"""
|
||||
origin = "csvfile"
|
||||
|
||||
parser = BlocklistParserMastodonCSV()
|
||||
bl = parser.parse_blocklist(csvdata, origin)
|
||||
|
||||
assert len(bl) == 4
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl['example.org'].public_comment == ''
|
||||
assert bl['example.org'].private_comment == ''
|
||||
assert bl['example3.org'].public_comment == ''
|
||||
assert bl['example4.org'].private_comment == ''
|
|
@ -1,8 +1,8 @@
|
|||
"""Tests of the CSV parsing
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklist_parser import BlocklistParserJSON, parse_blocklist
|
||||
from fediblockhole.const import DomainBlock, BlockSeverity, SeverityLevel
|
||||
from fediblockhole.blocklists import BlocklistParserJSON, parse_blocklist
|
||||
from fediblockhole.const import SeverityLevel
|
||||
|
||||
datafile = 'data-mastodon.json'
|
||||
|
||||
|
@ -14,33 +14,32 @@ def test_json_parser():
|
|||
|
||||
data = load_data()
|
||||
parser = BlocklistParserJSON()
|
||||
bl = parser.parse_blocklist(data)
|
||||
bl = parser.parse_blocklist(data, 'test_json')
|
||||
|
||||
assert len(bl) == 10
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert bl[1].domain == 'example2.org'
|
||||
assert bl[2].domain == 'example3.org'
|
||||
assert bl[3].domain == 'example4.org'
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl[0].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[1].severity.level == SeverityLevel.SILENCE
|
||||
assert bl[2].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[3].severity.level == SeverityLevel.NONE
|
||||
assert bl['example.org'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['example2.org'].severity.level == SeverityLevel.SILENCE
|
||||
assert bl['example3.org'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['example4.org'].severity.level == SeverityLevel.NONE
|
||||
|
||||
def test_ignore_comments():
|
||||
|
||||
data = load_data()
|
||||
parser = BlocklistParserJSON()
|
||||
bl = parser.parse_blocklist(data)
|
||||
bl = parser.parse_blocklist(data, 'test_json')
|
||||
|
||||
assert len(bl) == 10
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert bl[1].domain == 'example2.org'
|
||||
assert bl[2].domain == 'example3.org'
|
||||
assert bl[3].domain == 'example4.org'
|
||||
assert 'example.org' in bl
|
||||
assert 'example2.org' in bl
|
||||
assert 'example3.org' in bl
|
||||
assert 'example4.org' in bl
|
||||
|
||||
assert bl[0].public_comment == ''
|
||||
assert bl[0].private_comment == ''
|
||||
|
||||
assert bl[2].public_comment == ''
|
||||
assert bl[2].private_comment == ''
|
||||
assert bl['example.org'].public_comment == ''
|
||||
assert bl['example.org'].private_comment == ''
|
||||
assert bl['example3.org'].public_comment == ''
|
||||
assert bl['example4.org'].private_comment == ''
|
|
@ -1,7 +1,7 @@
|
|||
"""Tests of the Rapidblock CSV parsing
|
||||
"""
|
||||
|
||||
from fediblockhole.blocklist_parser import RapidBlockParserCSV, parse_blocklist
|
||||
from fediblockhole.blocklists import RapidBlockParserCSV, parse_blocklist
|
||||
from fediblockhole.const import DomainBlock, BlockSeverity, SeverityLevel
|
||||
|
||||
csvdata = """example.org\r\nsubdomain.example.org\r\nanotherdomain.org\r\ndomain4.org\r\n"""
|
||||
|
@ -11,13 +11,13 @@ def test_basic_rapidblock():
|
|||
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
assert len(bl) == 4
|
||||
assert bl[0].domain == 'example.org'
|
||||
assert bl[1].domain == 'subdomain.example.org'
|
||||
assert bl[2].domain == 'anotherdomain.org'
|
||||
assert bl[3].domain == 'domain4.org'
|
||||
assert 'example.org' in bl
|
||||
assert 'subdomain.example.org' in bl
|
||||
assert 'anotherdomain.org' in bl
|
||||
assert 'domain4.org' in bl
|
||||
|
||||
def test_severity_is_suspend():
|
||||
bl = parser.parse_blocklist(csvdata)
|
||||
|
||||
for block in bl:
|
||||
for block in bl.values():
|
||||
assert block.severity.level == SeverityLevel.SUSPEND
|
|
@ -1,6 +1,6 @@
|
|||
"""Test parsing the RapidBlock JSON format
|
||||
"""
|
||||
from fediblockhole.blocklist_parser import parse_blocklist
|
||||
from fediblockhole.blocklists import parse_blocklist
|
||||
|
||||
from fediblockhole.const import SeverityLevel
|
||||
|
||||
|
@ -9,26 +9,26 @@ rapidblockjson = "data-rapidblock.json"
|
|||
def test_parse_rapidblock_json():
|
||||
with open(rapidblockjson) as fp:
|
||||
data = fp.read()
|
||||
bl = parse_blocklist(data, 'rapidblock.json')
|
||||
bl = parse_blocklist(data, 'pytest', 'rapidblock.json')
|
||||
|
||||
assert bl[0].domain == '101010.pl'
|
||||
assert bl[0].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[0].public_comment == ''
|
||||
assert '101010.pl' in bl
|
||||
assert bl['101010.pl'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['101010.pl'].public_comment == ''
|
||||
|
||||
assert bl[10].domain == 'berserker.town'
|
||||
assert bl[10].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[10].public_comment == ''
|
||||
assert bl[10].private_comment == ''
|
||||
assert 'berserker.town' in bl
|
||||
assert bl['berserker.town'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['berserker.town'].public_comment == ''
|
||||
assert bl['berserker.town'].private_comment == ''
|
||||
|
||||
def test_parse_with_comments():
|
||||
with open(rapidblockjson) as fp:
|
||||
data = fp.read()
|
||||
bl = parse_blocklist(data, 'rapidblock.json', ['domain', 'severity', 'public_comment', 'private_comment'])
|
||||
bl = parse_blocklist(data, 'pytest', 'rapidblock.json', ['domain', 'severity', 'public_comment', 'private_comment'])
|
||||
|
||||
assert bl[0].domain == '101010.pl'
|
||||
assert bl[0].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[0].public_comment == 'cryptomining javascript, white supremacy'
|
||||
assert '101010.pl' in bl
|
||||
assert bl['101010.pl'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['101010.pl'].public_comment == 'cryptomining javascript, white supremacy'
|
||||
|
||||
assert bl[10].domain == 'berserker.town'
|
||||
assert bl[10].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl[10].public_comment == 'freeze peach'
|
||||
assert 'berserker.town' in bl
|
||||
assert bl['berserker.town'].severity.level == SeverityLevel.SUSPEND
|
||||
assert bl['berserker.town'].public_comment == 'freeze peach'
|
Loading…
Reference in New Issue