Collapsible Thought Filter
This commit is contained in:
parent
91a7de26fd
commit
06bedb65dd
124
README.md
124
README.md
|
@ -14,15 +14,14 @@ So far:
|
|||
|
||||
- **Checkpoint Summarization Filter:** A work-in-progress replacement
|
||||
for the narrative memory filter for more generalized use cases.
|
||||
- **Memory Filter:** A basic narrative memory filter intended for
|
||||
long-form storytelling/roleplaying scenarios. Intended as a proof
|
||||
of concept/springboard for more advanced narrative memory.
|
||||
- **GPU Scaling Filter:** Reduce number of GPU layers in use if Ollama
|
||||
crashes due to running out of VRAM.
|
||||
crashes due to running out of VRAM. Deprecated.
|
||||
- **Output Sanitization Filter:** Remove words, phrases, or
|
||||
characters from the start of model replies.
|
||||
- **OpenStreetMap Tool:** Tool for querying OpenStreetMap to look up
|
||||
address details and nearby points of interest.
|
||||
- **Collapsible Thought Filter:** Hide LLM reasoning/thinking in a
|
||||
collapisble block.
|
||||
|
||||
## Checkpoint Summarization Filter
|
||||
|
||||
|
@ -97,96 +96,10 @@ There are some limitations to be aware of:
|
|||
- The filter only loads the most recent summary, and thus the AI
|
||||
might "forget" much older information.
|
||||
|
||||
## Memory Filter
|
||||
|
||||
__Superseded By: [Checkpoint Summarization Filter][checkpoint-filter]__
|
||||
|
||||
Super hacky, very basic automatic narrative memory filter for
|
||||
OpenWebUI, that may or may not actually enhance narrative generation!
|
||||
|
||||
This is intended to be a springboard for a better, more comprehensive
|
||||
filter that can coherently keep track(ish?) of plot and character
|
||||
developments in long form story writing/roleplaying scenarios, where
|
||||
context window length is limited (or ollama crashes on long context
|
||||
length models despite having 40 GB of unused memory!).
|
||||
|
||||
### Configuration
|
||||
|
||||
The filter exposes two settings:
|
||||
|
||||
- **Summarization model:** This is the model used for extracting and
|
||||
creating all of the narrative memory, and searching info. It must
|
||||
be good at following instructions. I use Gemma 2.
|
||||
- **It must be a base model.** If it's not, things will not work.
|
||||
- If you don't set this, the filter will attempt to use the model
|
||||
in the conversation. It must still be a base model.
|
||||
- **Number of messages to retain:** Number of messages to retain for the
|
||||
context. All messages before that are dropped in order to manage
|
||||
context length.
|
||||
|
||||
Ideally, the summarization model is the same model you are using for
|
||||
the storytelling. Otherwise you may have lots of model swap-outs.
|
||||
|
||||
The filter hooks in to OpenWebUI's RAG settings to generate embeddings
|
||||
and query the vector database. The filter will use the same embedding
|
||||
model and ChromaDB instance that's configured in the admin settings.
|
||||
|
||||
### Usage
|
||||
|
||||
Enable the filter on a model that you want to use to generate stories.
|
||||
It is recommended, although not required, that this be the same model
|
||||
as the summarizer model (above). If you have lots of VRAM or are very
|
||||
patient, you can use different models.
|
||||
|
||||
User input is pre-processed to 'enrich' the narrative. Replies from
|
||||
the language model are analyzed post-delivery to update the story's
|
||||
knowlege repository.
|
||||
|
||||
You will see status indicators on LLM messages indicating what the
|
||||
filter is doing.
|
||||
|
||||
Do not reply while the model is updating its knowledge base or funny
|
||||
things might happen.
|
||||
|
||||
### Function
|
||||
|
||||
What does it do?
|
||||
- When receiving user input, generate search queries for vector DB
|
||||
based on user input + last model response.
|
||||
- Search vector DB for theoretically relevant character and plot
|
||||
information.
|
||||
- Ask model to summarize results into coherent and more relevant
|
||||
stuff.
|
||||
- Inject results as <context>contextual info</context> for the model.
|
||||
- After receiving model narrative reply, generate character and plot
|
||||
info and stick them into the vector DB.
|
||||
|
||||
### Limitations and Known Issues
|
||||
|
||||
What does it not do?
|
||||
- Handle conversational branching/regeneration. In fact, this will
|
||||
pollute the knowledgebase with extra information!
|
||||
- Bouncing around some ideas to fix this. Basically requires
|
||||
building a "canonical" branching story path in the database?
|
||||
- Proper context "chapter" summarization (planned to change).
|
||||
- ~~Work properly when switching conversations due to OpenWebUI
|
||||
limitations. The chat ID is not available on incoming requests for
|
||||
some reason, so a janky workaround is used when processing LLM
|
||||
responses.~~ Fixed! (but still in a very hacky way)
|
||||
- Clear out information of old conversations or expire irrelevant
|
||||
data.
|
||||
|
||||
Other things to do or improve:
|
||||
- Set a minimum search score, to prevent useless stuff from coming up.
|
||||
- Figure out how to expire or update information about characters and
|
||||
events, instead of dumping it all into the vector DB.
|
||||
- Improve multi-user handling. Should technically sort of work due to
|
||||
messages having UUIDs, but is a bit messy. Only one collection is
|
||||
used, so multiple users = concurrency issues?
|
||||
- Block user input while updating the knowledgebase.
|
||||
|
||||
## GPU Scaling Filter
|
||||
|
||||
_Deprecated. Use the setting in OpenWebUI chat controls._
|
||||
|
||||
This is a simple filter that reduces the number of GPU layers in use
|
||||
by Ollama when it detects that Ollama has crashed (via empty response
|
||||
coming in to OpenWebUI). Right now, the logic is very basic, just
|
||||
|
@ -269,7 +182,7 @@ volume of traffic (absolute max 1 API call per second). If you are
|
|||
running a production service, you should set up your own Nominatim and
|
||||
Overpass services with caching.
|
||||
|
||||
## How to enable 'Where is the closest X to my location?'
|
||||
### How to enable 'Where is the closest X to my location?'
|
||||
|
||||
In order to have the OSM tool be able to answer questions like "where
|
||||
is the nearest grocery store to me?", it needs access to your realtime
|
||||
|
@ -281,6 +194,27 @@ location. This can be accomplished with the following steps:
|
|||
reported by the browser into the model's system prompt on every
|
||||
message.
|
||||
|
||||
# Collapsible Thought Filter
|
||||
|
||||
Hides model reasoning/thinking processes in a collapisble block in the
|
||||
UI response, similar to OpenAI o1 replies in ChatGPT. Designed to be
|
||||
used with [Reflection 70b](https://ollama.com/library/reflection) and
|
||||
similar models.
|
||||
|
||||
Current settings:
|
||||
|
||||
- Priority: what order to run this filter in.
|
||||
- Thought Title: the title of the collapsed thought block.
|
||||
- Thought Tag: The XML tag that contains the model's reasoning.
|
||||
- Output Tag: The XML tag that contains the model's final output.
|
||||
- Use Thoughts As Context: Whether or not to send LLM reasoning text
|
||||
as context. Disabled by default because it drastically increases
|
||||
token use.
|
||||
|
||||
**Note on XML tag settings:** These should be tags WITHOUT the `<>`.
|
||||
If you customize the tag setting, make sure your setting is just
|
||||
`mytag` and not `<mytag>` or anything else.
|
||||
|
||||
# License
|
||||
|
||||
<img src="./agplv3.png" alt="AGPLv3" />
|
||||
|
@ -292,6 +226,10 @@ others, in accordance with the terms of the AGPL. Make sure you are
|
|||
aware how this might affect your OpenWebUI deployment, if you are
|
||||
deploying OpenWebUI in a public environment!
|
||||
|
||||
Some filters may have code in them subject to other licenses. In those
|
||||
cases, the licenses and what parts of the code they apply to are
|
||||
detailed in that specific file.
|
||||
|
||||
[agpl]: https://www.gnu.org/licenses/agpl-3.0.en.html
|
||||
[checkpoint-filter]: #checkpoint-summarization-filter
|
||||
[nom-tou]: https://operations.osmfoundation.org/policies/nominatim/
|
||||
|
|
|
@ -0,0 +1,156 @@
|
|||
"""
|
||||
title: Collapsible Thought Filter
|
||||
author: projectmoon
|
||||
author_url: https://git.agnos.is/projectmoon/open-webui-filters
|
||||
version: 0.1.0
|
||||
license: AGPL-3.0+, MIT
|
||||
required_open_webui_version: 0.3.32
|
||||
"""
|
||||
|
||||
#########################################################
|
||||
# OpenWebUI Filter that collapses model reasoning/thinking into a
|
||||
# separate section in the reply.
|
||||
|
||||
# Based on the Add or Delete Text Filter by anfi.
|
||||
# https://openwebui.com/f/anfi/add_or_delete_text
|
||||
#
|
||||
# Therefore, portions of this code are licensed under the MIT license.
|
||||
# The modifications made for "thought enclosure" etc are licensed
|
||||
# under the AGPL using the MIT's sublicensing clause.
|
||||
#
|
||||
# For those portions under the MIT license, the following applies:
|
||||
#
|
||||
# MIT License
|
||||
#
|
||||
# Copyright (c) 2024 anfi
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to deal
|
||||
# in the Software without restriction, including without limitation the rights
|
||||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
# copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in all
|
||||
# copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
# SOFTWARE.
|
||||
#########################################################
|
||||
|
||||
from typing import Optional, Dict, List
|
||||
import re
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
THOUGHT_ENCLOSURE = """
|
||||
<details>
|
||||
<summary>{{THOUGHT_TITLE}}</summary>
|
||||
{{THOUGHTS}}
|
||||
</details>
|
||||
"""
|
||||
|
||||
DETAIL_DELETION_REGEX = r"</?details>[\s\S]*?</details>"
|
||||
|
||||
class Filter:
|
||||
class Valves(BaseModel):
|
||||
priority: int = Field(
|
||||
default=0, description="Priority level for the filter operations."
|
||||
)
|
||||
thought_title: str = Field(
|
||||
default="Thought Process",
|
||||
description="Title for the collapsible reasoning section."
|
||||
)
|
||||
thought_tag: str = Field(
|
||||
default="thinking",
|
||||
description="The XML tag for model thinking output."
|
||||
)
|
||||
output_tag: str = Field(
|
||||
default="output",
|
||||
description="The XML tag for model final output."
|
||||
)
|
||||
use_thoughts_as_context: bool = Field(
|
||||
default=False,
|
||||
description=("Include previous thought processes as context for the AI. "
|
||||
"Disabled by default.")
|
||||
)
|
||||
pass
|
||||
|
||||
def __init__(self):
|
||||
self.valves = self.Valves()
|
||||
|
||||
def _create_thought_regex(self) -> str:
|
||||
tag = self.valves.thought_tag
|
||||
return f"<{tag}>(.*?)</{tag}>"
|
||||
|
||||
def _create_thought_tag_deletion_regex(self) -> str:
|
||||
tag = self.valves.thought_tag
|
||||
return "</?{{THINK}}>[\s\S]*?</{{THINK}}>".replace("{{THINK}}", tag)
|
||||
|
||||
def _create_output_tag_deletion_regex(self) -> str:
|
||||
tag = self.valves.output_tag
|
||||
return r"</?{{OUT}}>[\s\S]*?</{{OUT}}>".replace("{{OUT}}", tag)
|
||||
|
||||
def _enclose_thoughts(self, messages: List[Dict[str, str]]) -> None:
|
||||
if not messages:
|
||||
return
|
||||
|
||||
# collapsible thinking process section
|
||||
thought_regex = self._create_thought_regex()
|
||||
reply = messages[-1]["content"]
|
||||
thoughts = re.findall(thought_regex, reply, re.DOTALL)
|
||||
thoughts = "\n".join(thoughts).strip()
|
||||
enclosure = THOUGHT_ENCLOSURE.replace("{{THOUGHT_TITLE}}", self.valves.thought_title)
|
||||
enclosure = enclosure.replace("{{THOUGHTS}}", thoughts).strip()
|
||||
|
||||
# remove processed thinking and output tags.
|
||||
# some models do not close output tags properly.
|
||||
thought_tag_deletion_regex = self._create_thought_tag_deletion_regex()
|
||||
output_tag_deletion_regex = self._create_output_tag_deletion_regex()
|
||||
reply = re.sub(thought_tag_deletion_regex, "", reply, count=1)
|
||||
reply = re.sub(output_tag_deletion_regex, "", reply, count=1)
|
||||
reply = reply.replace(f"<{self.valves.output_tag}>", "", 1)
|
||||
reply = reply.replace(f"</{self.valves.output_tag}>", "", 1)
|
||||
|
||||
# prevents empty thought process blocks when filter used with
|
||||
# malformed LLM output.
|
||||
if len(enclosure) > 0:
|
||||
reply = f"{enclosure}\n{reply}"
|
||||
|
||||
messages[-1]["content"] = reply
|
||||
|
||||
def _handle_include_thoughts(self, messages: List[Dict[str, str]]) -> None:
|
||||
"""Remove <details> tags from input, if configured to do so."""
|
||||
# <details> tags are created by the outlet filter for display
|
||||
# in OWUI.
|
||||
if self.valves.use_thoughts_as_context:
|
||||
return
|
||||
|
||||
for message in messages:
|
||||
message["content"] = re.sub(
|
||||
DETAIL_DELETION_REGEX, "", message["content"], count=1
|
||||
)
|
||||
|
||||
def inlet(self, body: Dict[str, any], __user__: Optional[Dict[str, any]] = None) -> Dict[str, any]:
|
||||
try:
|
||||
original_messages: List[Dict[str, str]] = body.get("messages", [])
|
||||
self._handle_include_thoughts(original_messages)
|
||||
body["messages"] = original_messages
|
||||
return body
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return body
|
||||
|
||||
def outlet(self, body: Dict[str, any], __user__: Optional[Dict[str, any]] = None) -> Dict[str, any]:
|
||||
try:
|
||||
original_messages: List[Dict[str, str]] = body.get("messages", [])
|
||||
self._enclose_thoughts(original_messages)
|
||||
body["messages"] = original_messages
|
||||
return body
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return body
|
Loading…
Reference in New Issue