Collapsible Thought Filter

This commit is contained in:
projectmoon 2024-10-07 22:12:20 +02:00
parent 91a7de26fd
commit 06bedb65dd
2 changed files with 187 additions and 93 deletions

124
README.md
View File

@ -14,15 +14,14 @@ So far:
- **Checkpoint Summarization Filter:** A work-in-progress replacement
for the narrative memory filter for more generalized use cases.
- **Memory Filter:** A basic narrative memory filter intended for
long-form storytelling/roleplaying scenarios. Intended as a proof
of concept/springboard for more advanced narrative memory.
- **GPU Scaling Filter:** Reduce number of GPU layers in use if Ollama
crashes due to running out of VRAM.
crashes due to running out of VRAM. Deprecated.
- **Output Sanitization Filter:** Remove words, phrases, or
characters from the start of model replies.
- **OpenStreetMap Tool:** Tool for querying OpenStreetMap to look up
address details and nearby points of interest.
- **Collapsible Thought Filter:** Hide LLM reasoning/thinking in a
collapisble block.
## Checkpoint Summarization Filter
@ -97,96 +96,10 @@ There are some limitations to be aware of:
- The filter only loads the most recent summary, and thus the AI
might "forget" much older information.
## Memory Filter
__Superseded By: [Checkpoint Summarization Filter][checkpoint-filter]__
Super hacky, very basic automatic narrative memory filter for
OpenWebUI, that may or may not actually enhance narrative generation!
This is intended to be a springboard for a better, more comprehensive
filter that can coherently keep track(ish?) of plot and character
developments in long form story writing/roleplaying scenarios, where
context window length is limited (or ollama crashes on long context
length models despite having 40 GB of unused memory!).
### Configuration
The filter exposes two settings:
- **Summarization model:** This is the model used for extracting and
creating all of the narrative memory, and searching info. It must
be good at following instructions. I use Gemma 2.
- **It must be a base model.** If it's not, things will not work.
- If you don't set this, the filter will attempt to use the model
in the conversation. It must still be a base model.
- **Number of messages to retain:** Number of messages to retain for the
context. All messages before that are dropped in order to manage
context length.
Ideally, the summarization model is the same model you are using for
the storytelling. Otherwise you may have lots of model swap-outs.
The filter hooks in to OpenWebUI's RAG settings to generate embeddings
and query the vector database. The filter will use the same embedding
model and ChromaDB instance that's configured in the admin settings.
### Usage
Enable the filter on a model that you want to use to generate stories.
It is recommended, although not required, that this be the same model
as the summarizer model (above). If you have lots of VRAM or are very
patient, you can use different models.
User input is pre-processed to 'enrich' the narrative. Replies from
the language model are analyzed post-delivery to update the story's
knowlege repository.
You will see status indicators on LLM messages indicating what the
filter is doing.
Do not reply while the model is updating its knowledge base or funny
things might happen.
### Function
What does it do?
- When receiving user input, generate search queries for vector DB
based on user input + last model response.
- Search vector DB for theoretically relevant character and plot
information.
- Ask model to summarize results into coherent and more relevant
stuff.
- Inject results as <context>contextual info</context> for the model.
- After receiving model narrative reply, generate character and plot
info and stick them into the vector DB.
### Limitations and Known Issues
What does it not do?
- Handle conversational branching/regeneration. In fact, this will
pollute the knowledgebase with extra information!
- Bouncing around some ideas to fix this. Basically requires
building a "canonical" branching story path in the database?
- Proper context "chapter" summarization (planned to change).
- ~~Work properly when switching conversations due to OpenWebUI
limitations. The chat ID is not available on incoming requests for
some reason, so a janky workaround is used when processing LLM
responses.~~ Fixed! (but still in a very hacky way)
- Clear out information of old conversations or expire irrelevant
data.
Other things to do or improve:
- Set a minimum search score, to prevent useless stuff from coming up.
- Figure out how to expire or update information about characters and
events, instead of dumping it all into the vector DB.
- Improve multi-user handling. Should technically sort of work due to
messages having UUIDs, but is a bit messy. Only one collection is
used, so multiple users = concurrency issues?
- Block user input while updating the knowledgebase.
## GPU Scaling Filter
_Deprecated. Use the setting in OpenWebUI chat controls._
This is a simple filter that reduces the number of GPU layers in use
by Ollama when it detects that Ollama has crashed (via empty response
coming in to OpenWebUI). Right now, the logic is very basic, just
@ -269,7 +182,7 @@ volume of traffic (absolute max 1 API call per second). If you are
running a production service, you should set up your own Nominatim and
Overpass services with caching.
## How to enable 'Where is the closest X to my location?'
### How to enable 'Where is the closest X to my location?'
In order to have the OSM tool be able to answer questions like "where
is the nearest grocery store to me?", it needs access to your realtime
@ -281,6 +194,27 @@ location. This can be accomplished with the following steps:
reported by the browser into the model's system prompt on every
message.
# Collapsible Thought Filter
Hides model reasoning/thinking processes in a collapisble block in the
UI response, similar to OpenAI o1 replies in ChatGPT. Designed to be
used with [Reflection 70b](https://ollama.com/library/reflection) and
similar models.
Current settings:
- Priority: what order to run this filter in.
- Thought Title: the title of the collapsed thought block.
- Thought Tag: The XML tag that contains the model's reasoning.
- Output Tag: The XML tag that contains the model's final output.
- Use Thoughts As Context: Whether or not to send LLM reasoning text
as context. Disabled by default because it drastically increases
token use.
**Note on XML tag settings:** These should be tags WITHOUT the `<>`.
If you customize the tag setting, make sure your setting is just
`mytag` and not `<mytag>` or anything else.
# License
<img src="./agplv3.png" alt="AGPLv3" />
@ -292,6 +226,10 @@ others, in accordance with the terms of the AGPL. Make sure you are
aware how this might affect your OpenWebUI deployment, if you are
deploying OpenWebUI in a public environment!
Some filters may have code in them subject to other licenses. In those
cases, the licenses and what parts of the code they apply to are
detailed in that specific file.
[agpl]: https://www.gnu.org/licenses/agpl-3.0.en.html
[checkpoint-filter]: #checkpoint-summarization-filter
[nom-tou]: https://operations.osmfoundation.org/policies/nominatim/

156
thinking.py Normal file
View File

@ -0,0 +1,156 @@
"""
title: Collapsible Thought Filter
author: projectmoon
author_url: https://git.agnos.is/projectmoon/open-webui-filters
version: 0.1.0
license: AGPL-3.0+, MIT
required_open_webui_version: 0.3.32
"""
#########################################################
# OpenWebUI Filter that collapses model reasoning/thinking into a
# separate section in the reply.
# Based on the Add or Delete Text Filter by anfi.
# https://openwebui.com/f/anfi/add_or_delete_text
#
# Therefore, portions of this code are licensed under the MIT license.
# The modifications made for "thought enclosure" etc are licensed
# under the AGPL using the MIT's sublicensing clause.
#
# For those portions under the MIT license, the following applies:
#
# MIT License
#
# Copyright (c) 2024 anfi
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#########################################################
from typing import Optional, Dict, List
import re
from pydantic import BaseModel, Field
THOUGHT_ENCLOSURE = """
<details>
<summary>{{THOUGHT_TITLE}}</summary>
{{THOUGHTS}}
</details>
"""
DETAIL_DELETION_REGEX = r"</?details>[\s\S]*?</details>"
class Filter:
class Valves(BaseModel):
priority: int = Field(
default=0, description="Priority level for the filter operations."
)
thought_title: str = Field(
default="Thought Process",
description="Title for the collapsible reasoning section."
)
thought_tag: str = Field(
default="thinking",
description="The XML tag for model thinking output."
)
output_tag: str = Field(
default="output",
description="The XML tag for model final output."
)
use_thoughts_as_context: bool = Field(
default=False,
description=("Include previous thought processes as context for the AI. "
"Disabled by default.")
)
pass
def __init__(self):
self.valves = self.Valves()
def _create_thought_regex(self) -> str:
tag = self.valves.thought_tag
return f"<{tag}>(.*?)</{tag}>"
def _create_thought_tag_deletion_regex(self) -> str:
tag = self.valves.thought_tag
return "</?{{THINK}}>[\s\S]*?</{{THINK}}>".replace("{{THINK}}", tag)
def _create_output_tag_deletion_regex(self) -> str:
tag = self.valves.output_tag
return r"</?{{OUT}}>[\s\S]*?</{{OUT}}>".replace("{{OUT}}", tag)
def _enclose_thoughts(self, messages: List[Dict[str, str]]) -> None:
if not messages:
return
# collapsible thinking process section
thought_regex = self._create_thought_regex()
reply = messages[-1]["content"]
thoughts = re.findall(thought_regex, reply, re.DOTALL)
thoughts = "\n".join(thoughts).strip()
enclosure = THOUGHT_ENCLOSURE.replace("{{THOUGHT_TITLE}}", self.valves.thought_title)
enclosure = enclosure.replace("{{THOUGHTS}}", thoughts).strip()
# remove processed thinking and output tags.
# some models do not close output tags properly.
thought_tag_deletion_regex = self._create_thought_tag_deletion_regex()
output_tag_deletion_regex = self._create_output_tag_deletion_regex()
reply = re.sub(thought_tag_deletion_regex, "", reply, count=1)
reply = re.sub(output_tag_deletion_regex, "", reply, count=1)
reply = reply.replace(f"<{self.valves.output_tag}>", "", 1)
reply = reply.replace(f"</{self.valves.output_tag}>", "", 1)
# prevents empty thought process blocks when filter used with
# malformed LLM output.
if len(enclosure) > 0:
reply = f"{enclosure}\n{reply}"
messages[-1]["content"] = reply
def _handle_include_thoughts(self, messages: List[Dict[str, str]]) -> None:
"""Remove <details> tags from input, if configured to do so."""
# <details> tags are created by the outlet filter for display
# in OWUI.
if self.valves.use_thoughts_as_context:
return
for message in messages:
message["content"] = re.sub(
DETAIL_DELETION_REGEX, "", message["content"], count=1
)
def inlet(self, body: Dict[str, any], __user__: Optional[Dict[str, any]] = None) -> Dict[str, any]:
try:
original_messages: List[Dict[str, str]] = body.get("messages", [])
self._handle_include_thoughts(original_messages)
body["messages"] = original_messages
return body
except Exception as e:
print(e)
return body
def outlet(self, body: Dict[str, any], __user__: Optional[Dict[str, any]] = None) -> Dict[str, any]:
try:
original_messages: List[Dict[str, str]] = body.get("messages", [])
self._enclose_thoughts(original_messages)
body["messages"] = original_messages
return body
except Exception as e:
print(e)
return body