open-webui-filters/README.md

# OpenWebUI Filters and Tools

_Mirrored at Github: https://github.com/ProjectMoon/open-webui-filters_

Documentation (HTML):
[https://agnos.is/projects/open-webui-filters/][docs-html]

Documentation (Gemini):
[gemini://agnos.is/projects/open-webui-filters/][docs-gemini]

My collection of OpenWebUI Filters and Tools.

So far:

 - **Checkpoint Summarization Filter:** A work-in-progress replacement
   for the narrative memory filter for more generalized use cases.
 - **GPU Scaling Filter:** Reduce number of GPU layers in use if Ollama
   crashes due to running out of VRAM. Deprecated.
 - **Output Sanitization Filter:** Remove words, phrases, or
   characters from the start of model replies.
 - **OpenStreetMap Tool:** Tool for querying OpenStreetMap to look up
   address details and nearby points of interest.
 - **Collapsible Thought Filter:** Hide LLM reasoning/thinking in a
   collapisble block.

## Checkpoint Summarization Filter

A new filter for managing context use by summarizing previous parts of
the chat as the conversation continues. Designed for both general
chats and narrative/roleplay use. Work in progress.

### Configuration

There are currently 4 settings:

 - **Summarizer Model:** The model used to summarize the conversation
   as the chat continues. This must be a base model.
 - **Large Context Summarizer Model:** If large context summarization
   is turned on, use this model for summarizing huge contexts.
 - **Summarize Large Contexts:** If enabled, the filter will attempt
   to load the entire context into the large summarizer model for
   creating an initial checkpoint of an existing conversation.
 - **Wiggle Room:** This is the amount of 'wiggle room' for estimating
   a context shift. This number is subtracted from `num_ctx` for the
   purposes of determining whether or not a context shift has
   occurred.

### Usage

In general, you should only need to specify the summarizer model and
enable the filter on the OpenWebUI models that you want it to work on.
Or even enable it globally. The filter works best when used from a new
conversation, but it does have the (currently limited) ability to deal
with existing conversations.

  - When the filter detects a context shift in the conversation, it
    will summarize the pre-existing context. After that, the summary
    is appended to the system prompt, and old messages before
    summarization are dropped.
  - When the filter detects the next context shift, this process is
    repeated, and a new summarization checkpoint is created. And so
    on.

If the filter is used in an existing conversation, it will summarize
on the first time that it detects a context shift in the conversation:

 - If there are enough messages that the conversation is considered
   "big," and large context summarization is **disabled**, all but the
   last 4 messages will be dropped to form the summary.
 - If the conversation is considered "big," and large context
   summarization is **enabled**, then the large context model will be
   loaded to do the summarization, and the **entire conversation**
   will be given to it.

#### User Commands

There are some basic commands the user can use to interact with the
filter in a conversation:

 - `!nuke`: Deletes all summary checkpoints in the chat, and the
   filter will attempt to summarize from scratch the next time it
   detects a context shift.

### Limitations

There are some limitations to be aware of:
 - If you enable large context summarization, you need to make sure
   your system is capable of loading and summarizing an entire
   conversation.
 - Handling of branching conversations and regenerated responses is
   currently rather messy. It will kind of work. There are some plans
   to improve this.
 - If large context summarization is disabled, pre-existing large
   conversations will only summarize the previous 4 messages when the
   first summarization is detected.
 - The filter only loads the most recent summary, and thus the AI
   might "forget" much older information.

## GPU Scaling Filter

_Deprecated. Use the setting in OpenWebUI chat controls._

This is a simple filter that reduces the number of GPU layers in use
by Ollama when it detects that Ollama has crashed (via empty response
coming in to OpenWebUI). Right now, the logic is very basic, just
using static numbers to reduce GPU layer counts. It doesn't take into
account the number of layers in models or dynamically monitor VRAM
use.

There are three settings:

 - **Initial Reduction:** Number of layers to immediately set when an
   Ollama crash is detected. Defaults to 20.
 - **Scaling Step:** Number of layers to reduce by on subsequent crashes
   (down to a minimum of 0, i.e. 100% CPU inference). Defaults to 5.
 - **Show Status:** Whether or not to inform the user that the
   conversation is running slower due to GPU layer downscaling.

## Output Sanitization Filter

This filter is intended for models that often output unwanted
characters or terms at the beginning of replies. I have noticed this
especially with Beyonder V3 and related models. They sometimes output
a `":"` or `"Name:"` in front of replies. For example, if system prompt is
`"You are Quinn, a helpful assistant."` the model will often reply with
`"Quinn:"` as its first word.

There is one setting:

 - **Terms:** List of terms or characters to remove. This is a list,
   and in the UI, each item should be separated by a comma.

For the above example, the setting textbox should have `:,Quinn:` in
it, to remove a single colon from the start of replies, and `Quinn:`
from the start of replies.

### Other Notes

Terms are removed in the order defined by the setting. The filter
loops through each term and attempts to remove it from the start of
the LLM's reply.

## OpenStreetMap Tool

_Recommended models: Llama 3.1+, Mistral Nemo, Mistral Small, Qwen 2.5._

A tool that can find certain points of interest (POIs) nearby a
requested address or place.

These are the current settings:
 - **User Agent:** The custom user agent to set for OSM and Overpass
   Turbo API requests.
 - **From Header:** The email address for the From header for OSM and
   Overpass API requests.
 - **Nominatim API URL:** URL of the API endpoint for Nominatim, the
   reverse geocoding (address lookup) service. Defaults to the public
   instance. This must be the root URL, for example:
   `https://nominatim.openstreetmap.org/`.
 - **Overpass Turbo API URL:** URL of the API endpoint for Overpass
   Turbo, for searching OpenStreetMap. Defaults to the public
   endpoint.
 - **Instruction Oriented Interpretation:** Controls the level of
   detail in the instructions for interpreting results given to the
   LLM. By default, it gives detailed instructions. Turn this setting
   off if results are inconsistent, wrong, or missing.
 - **Status Indicators:** If enabled, emit update events to the web
   UI, showing what the tool is doing and what search results it has
   found, or if it has encountered an error.
 - **ORS API Key:** Provide an API key for Open Route Service to
   calculate navigational routes to nearby places, to provide more
   accurate search results.
 - **ORS Instance:** By default, use the public Open Route Service
   instance. Can be changed to point to another ORS instance.

The tool **will not run** without the User Agent and From headers set.
This is because the public instance of the Nominatim API will block
you if you do not set these. Use of the public Nominatim instance is
governed by their [terms of use][nom-tou].

The default API services are suitable for applications with a low
volume of traffic (absolute max 1 API call per second). If you are
running a production service, you should set up your own Nominatim and
Overpass services with caching.

### How to enable 'Where is the closest X to my location?'

In order to have the OSM tool be able to answer questions like "where
is the nearest grocery store to me?", it needs access to your realtime
location. This can be accomplished with the following steps:
 - Enable user location in your user settings.
 - Create a model with a _system prompt_ that references the variable
   `{{USER_LOCATION}}`.
 - OpenWebUI will automatically substitute the GPS coordinates
   reported by the browser into the model's system prompt on every
   message.

# Collapsible Thought Filter

Hides model reasoning/thinking processes in a collapisble block in the
UI response, similar to OpenAI o1 replies in ChatGPT. Designed to be
used with [Reflection 70b](https://ollama.com/library/reflection) and
similar models.

Current settings:

 - Priority: what order to run this filter in.
 - Thought Title: the title of the collapsed thought block.
 - Thought Tag: The XML tag that contains the model's reasoning.
 - Output Tag: The XML tag that contains the model's final output.
 - Use Thoughts As Context: Whether or not to send LLM reasoning text
   as context. Disabled by default because it drastically increases
   token use.

**Note on XML tag settings:** These should be tags WITHOUT the `<>`.
If you customize the tag setting, make sure your setting is just
`mytag` and not `<mytag>` or anything else.

# License

<img src="./agplv3.png" alt="AGPLv3" />

All filters are licensed under [AGPL v3.0+][agpl]. The code is free
software, that you can run, redistribute, modify, study, and learn
from as you see fit, as long as you extend that same freedom to
others, in accordance with the terms of the AGPL. Make sure you are
aware how this might affect your OpenWebUI deployment, if you are
deploying OpenWebUI in a public environment!

Some filters may have code in them subject to other licenses. In those
cases, the licenses and what parts of the code they apply to are
detailed in that specific file.

[agpl]: https://www.gnu.org/licenses/agpl-3.0.en.html
[checkpoint-filter]: #checkpoint-summarization-filter
[nom-tou]: https://operations.osmfoundation.org/policies/nominatim/
[docs-html]: https://agnos.is/projects/open-webui-filters/
[docs-gemini]: gemini://agnos.is/projects/open-webui-filters/
OSM tool 2024-08-04 10:43:23 +00:00			`# OpenWebUI Filters and Tools`
Update readme. 2024-07-18 20:13:00 +00:00
Add github link 2024-07-18 20:18:27 +00:00			`_Mirrored at Github: https://github.com/ProjectMoon/open-webui-filters_`

add doc links on main site 2024-08-07 12:14:02 +00:00			`Documentation (HTML):`
			`[https://agnos.is/projects/open-webui-filters/][docs-html]`

			`Documentation (Gemini):`
			`[gemini://agnos.is/projects/open-webui-filters/][docs-gemini]`

OSM tool 2024-08-04 10:43:23 +00:00			`My collection of OpenWebUI Filters and Tools.`
Update readme. 2024-07-18 20:13:00 +00:00
			`So far:`

Checkpoint summarization filter. 2024-07-24 22:55:58 +00:00			`- Checkpoint Summarization Filter: A work-in-progress replacement`
			`for the narrative memory filter for more generalized use cases.`
Update readme. 2024-07-18 20:13:00 +00:00			`- GPU Scaling Filter: Reduce number of GPU layers in use if Ollama`
Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`crashes due to running out of VRAM. Deprecated.`
Readme update. 2024-07-22 18:55:29 +00:00			`- Output Sanitization Filter: Remove words, phrases, or`
			`characters from the start of model replies.`
OSM tool 2024-08-04 10:43:23 +00:00			`- OpenStreetMap Tool: Tool for querying OpenStreetMap to look up`
			`address details and nearby points of interest.`
Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`- Collapsible Thought Filter: Hide LLM reasoning/thinking in a`
			`collapisble block.`
Update readme. 2024-07-18 20:13:00 +00:00
fix typo in readme 2024-09-27 21:10:42 +00:00			`## Checkpoint Summarization Filter`
Checkpoint summarization filter. 2024-07-24 22:55:58 +00:00
			`A new filter for managing context use by summarizing previous parts of`
			`the chat as the conversation continues. Designed for both general`
			`chats and narrative/roleplay use. Work in progress.`

			`### Configuration`

			`There are currently 4 settings:`

			`- Summarizer Model: The model used to summarize the conversation`
			`as the chat continues. This must be a base model.`
			`- Large Context Summarizer Model: If large context summarization`
			`is turned on, use this model for summarizing huge contexts.`
			`- Summarize Large Contexts: If enabled, the filter will attempt`
			`to load the entire context into the large summarizer model for`
			`creating an initial checkpoint of an existing conversation.`
			`- Wiggle Room: This is the amount of 'wiggle room' for estimating`
			a context shift. This number is subtracted from `num_ctx` for the
			`purposes of determining whether or not a context shift has`
			`occurred.`

			`### Usage`

			`In general, you should only need to specify the summarizer model and`
			`enable the filter on the OpenWebUI models that you want it to work on.`
			`Or even enable it globally. The filter works best when used from a new`
			`conversation, but it does have the (currently limited) ability to deal`
			`with existing conversations.`

			`- When the filter detects a context shift in the conversation, it`
			`will summarize the pre-existing context. After that, the summary`
			`is appended to the system prompt, and old messages before`
			`summarization are dropped.`
			`- When the filter detects the next context shift, this process is`
			`repeated, and a new summarization checkpoint is created. And so`
			`on.`

			`If the filter is used in an existing conversation, it will summarize`
			`on the first time that it detects a context shift in the conversation:`

			`- If there are enough messages that the conversation is considered`
			`"big," and large context summarization is disabled, all but the`
			`last 4 messages will be dropped to form the summary.`
			`- If the conversation is considered "big," and large context`
			`summarization is enabled, then the large context model will be`
			`loaded to do the summarization, and the entire conversation`
			`will be given to it.`

			`#### User Commands`

			`There are some basic commands the user can use to interact with the`
			`filter in a conversation:`

			- `!nuke`: Deletes all summary checkpoints in the chat, and the
			`filter will attempt to summarize from scratch the next time it`
			`detects a context shift.`

			`### Limitations`

			`There are some limitations to be aware of:`
			`- If you enable large context summarization, you need to make sure`
			`your system is capable of loading and summarizing an entire`
			`conversation.`
			`- Handling of branching conversations and regenerated responses is`
			`currently rather messy. It will kind of work. There are some plans`
			`to improve this.`
			`- If large context summarization is disabled, pre-existing large`
			`conversations will only summarize the previous 4 messages when the`
			`first summarization is detected.`
			`- The filter only loads the most recent summary, and thus the AI`
			`might "forget" much older information.`

Update readme. 2024-07-18 20:13:00 +00:00			`## GPU Scaling Filter`

Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`_Deprecated. Use the setting in OpenWebUI chat controls._`

Update readme. 2024-07-18 20:13:00 +00:00			`This is a simple filter that reduces the number of GPU layers in use`
			`by Ollama when it detects that Ollama has crashed (via empty response`
			`coming in to OpenWebUI). Right now, the logic is very basic, just`
			`using static numbers to reduce GPU layer counts. It doesn't take into`
			`account the number of layers in models or dynamically monitor VRAM`
			`use.`

			`There are three settings:`
Readme update. 2024-07-22 18:55:29 +00:00
Update readme. 2024-07-18 20:13:00 +00:00			`- Initial Reduction: Number of layers to immediately set when an`
			`Ollama crash is detected. Defaults to 20.`
			`- Scaling Step: Number of layers to reduce by on subsequent crashes`
			`(down to a minimum of 0, i.e. 100% CPU inference). Defaults to 5.`
			`- Show Status: Whether or not to inform the user that the`
			`conversation is running slower due to GPU layer downscaling.`

Readme update. 2024-07-22 18:55:29 +00:00			`## Output Sanitization Filter`

			`This filter is intended for models that often output unwanted`
			`characters or terms at the beginning of replies. I have noticed this`
			`especially with Beyonder V3 and related models. They sometimes output`
			a `":"` or `"Name:"` in front of replies. For example, if system prompt is
			`"You are Quinn, a helpful assistant."` the model will often reply with
			`"Quinn:"` as its first word.

			`There is one setting:`

			`- Terms: List of terms or characters to remove. This is a list,`
			`and in the UI, each item should be separated by a comma.`

			For the above example, the setting textbox should have `:,Quinn:` in
			it, to remove a single colon from the start of replies, and `Quinn:`
			`from the start of replies.`

			`### Other Notes`

			`Terms are removed in the order defined by the setting. The filter`
			`loops through each term and attempts to remove it from the start of`
			`the LLM's reply.`

OSM tool 2024-08-04 10:43:23 +00:00			`## OpenStreetMap Tool`

OSM: Integrate OpenRouteService 2024-09-27 21:23:56 +00:00			`_Recommended models: Llama 3.1+, Mistral Nemo, Mistral Small, Qwen 2.5._`
Improve search results in OSM tool. 2024-08-05 07:48:44 +00:00
OSM tool 2024-08-04 10:43:23 +00:00			`A tool that can find certain points of interest (POIs) nearby a`
			`requested address or place.`

OSM: Integrate OpenRouteService 2024-09-27 21:23:56 +00:00			`These are the current settings:`
OSM tool 2024-08-04 10:43:23 +00:00			`- User Agent: The custom user agent to set for OSM and Overpass`
			`Turbo API requests.`
			`- From Header: The email address for the From header for OSM and`
			`Overpass API requests.`
			`- Nominatim API URL: URL of the API endpoint for Nominatim, the`
			`reverse geocoding (address lookup) service. Defaults to the public`
OSM: handle unnamed POIs, data caching. - Handle unnamed POIs by looking up nearby addresses from Nominatim. - Cache basically everything in a JSON file (very inefficiently too). - This helps adhere to the Nominatim TOU and reduces load on all public OSM services. - Breaking change for the Nominatim URL valve setting. - Send citations containing search results. 2024-09-30 16:01:25 +00:00			`instance. This must be the root URL, for example:`
			`https://nominatim.openstreetmap.org/`.
OSM tool 2024-08-04 10:43:23 +00:00			`- Overpass Turbo API URL: URL of the API endpoint for Overpass`
			`Turbo, for searching OpenStreetMap. Defaults to the public`
			`endpoint.`
OSM: better accuracy, bunch more searchable things. 2024-09-17 20:10:04 +00:00			`- Instruction Oriented Interpretation: Controls the level of`
			`detail in the instructions for interpreting results given to the`
			`LLM. By default, it gives detailed instructions. Turn this setting`
			`off if results are inconsistent, wrong, or missing.`
OSM: Event emitter to show updates. Add FAQ. 2024-09-23 20:12:17 +00:00			`- Status Indicators: If enabled, emit update events to the web`
			`UI, showing what the tool is doing and what search results it has`
			`found, or if it has encountered an error.`
OSM: Integrate OpenRouteService 2024-09-27 21:23:56 +00:00			`- ORS API Key: Provide an API key for Open Route Service to`
			`calculate navigational routes to nearby places, to provide more`
			`accurate search results.`
			`- ORS Instance: By default, use the public Open Route Service`
			`instance. Can be changed to point to another ORS instance.`
OSM tool 2024-08-04 10:43:23 +00:00
			`The tool will not run without the User Agent and From headers set.`
			`This is because the public instance of the Nominatim API will block`
			`you if you do not set these. Use of the public Nominatim instance is`
			`governed by their [terms of use][nom-tou].`

Improve search results in OSM tool. 2024-08-05 07:48:44 +00:00			`The default API services are suitable for applications with a low`
			`volume of traffic (absolute max 1 API call per second). If you are`
			`running a production service, you should set up your own Nominatim and`
			`Overpass services with caching.`

Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`### How to enable 'Where is the closest X to my location?'`
OSM: Event emitter to show updates. Add FAQ. 2024-09-23 20:12:17 +00:00
			`In order to have the OSM tool be able to answer questions like "where`
			`is the nearest grocery store to me?", it needs access to your realtime`
			`location. This can be accomplished with the following steps:`
			`- Enable user location in your user settings.`
			`- Create a model with a _system prompt_ that references the variable`
			`{{USER_LOCATION}}`.
			`- OpenWebUI will automatically substitute the GPS coordinates`
			`reported by the browser into the model's system prompt on every`
			`message.`

Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`# Collapsible Thought Filter`

			`Hides model reasoning/thinking processes in a collapisble block in the`
			`UI response, similar to OpenAI o1 replies in ChatGPT. Designed to be`
			`used with [Reflection 70b](https://ollama.com/library/reflection) and`
			`similar models.`

			`Current settings:`

			`- Priority: what order to run this filter in.`
			`- Thought Title: the title of the collapsed thought block.`
			`- Thought Tag: The XML tag that contains the model's reasoning.`
			`- Output Tag: The XML tag that contains the model's final output.`
			`- Use Thoughts As Context: Whether or not to send LLM reasoning text`
			`as context. Disabled by default because it drastically increases`
			`token use.`

			Note on XML tag settings: These should be tags WITHOUT the `<>`.
			`If you customize the tag setting, make sure your setting is just`
			`mytag` and not `<mytag>` or anything else.

Update readme. 2024-07-18 20:13:00 +00:00			`# License`
Basic super hacky memory filter. 2024-07-15 14:39:54 +00:00
Readme update. 2024-07-22 18:55:29 +00:00			`<img src="./agplv3.png" alt="AGPLv3" />`

			`All filters are licensed under [AGPL v3.0+][agpl]. The code is free`
			`software, that you can run, redistribute, modify, study, and learn`
			`from as you see fit, as long as you extend that same freedom to`
readme update 2024-07-22 19:00:16 +00:00			`others, in accordance with the terms of the AGPL. Make sure you are`
			`aware how this might affect your OpenWebUI deployment, if you are`
			`deploying OpenWebUI in a public environment!`
Readme update. 2024-07-22 18:55:29 +00:00
Collapsible Thought Filter 2024-10-07 20:12:20 +00:00			`Some filters may have code in them subject to other licenses. In those`
			`cases, the licenses and what parts of the code they apply to are`
			`detailed in that specific file.`

Readme update. 2024-07-22 18:55:29 +00:00			`[agpl]: https://www.gnu.org/licenses/agpl-3.0.en.html`
Checkpoint summarization filter. 2024-07-24 22:55:58 +00:00			`[checkpoint-filter]: #checkpoint-summarization-filter`
OSM tool 2024-08-04 10:43:23 +00:00			`[nom-tou]: https://operations.osmfoundation.org/policies/nominatim/`
add doc links on main site 2024-08-07 12:14:02 +00:00			`[docs-html]: https://agnos.is/projects/open-webui-filters/`
			`[docs-gemini]: gemini://agnos.is/projects/open-webui-filters/`