Knowledge-and-Memory-Management v0.0.2: Knowledge Collection & Memory Management
Knowledge Collection Module
The core addition in v0.0.2 is the Knowledge Collection module. It abstracts content ingestion into a unified pipeline with plugins for specific sources:
- Web scraping: HTML and RSS
- Video transcript extraction: via YouTube API or local file processing
- Article parsing: supporting PDF, EPUB, and Markdown
Each plugin normalizes content into a chunked, timestamped structure that is passed directly to memory storage-no intermediate files are written by default.
Memory Management
Memory Management in v0.0.2 uses a vector-based index with optional persistent backends (SQLite, PostgreSQL, or Redis). Ingested knowledge is automatically embedded using a configurable model (default: all-MiniLM-L6-v2) and stored with metadata tags.
The system supports:
- Automatic deduplication via content hashing
- Hybrid retrieval mechanism combining vector similarity with keyword filters
- New
forgetAPI for explicit removal of entries by ID or age, enabling control over memory capacity
Infrastructure: $AGENT_HOME Transition
The transition to $AGENT_HOME is the most impactful infrastructure change. Previously, the module hardcoded paths like /home/user/.km or C:\Users\.km. Now, all data directories (index files, plugin caches, config) are resolved at runtime relative to the KM_ROOT environment variable, which defaults to $AGENT_HOME/km. This makes containerized deployments and multi-user setups trivial-each agent instance automatically uses a separate, isolated directory.
Basic Workflow Example
The following code example demonstrates a basic workflow in v0.0.2: configuring an agent, collecting content from two sources, and querying memory.
from knowledge_memory import AgentMemory, KnowledgeCollector
import os
# Agent home is automatically resolved from KM_ROOT or $AGENT_HOME
agent_home = os.environ.get("AGENT_HOME", "/tmp/agent")
km = AgentMemory(home=agent_home)
# Initialize collector with source-specific options
collector = KnowledgeCollector(memory=km)
collector.add_source("web", url="https://example.com/report", selector="article")
collector.add_source("video", url="https://youtube.com/watch?v=abc123", language="en")
# Run ingestion (extracts, chunks, and stores in memory)
collector.run()
# Query memory with vector + keyword filter
results = km.query("latest findings from report", top_k=3, tags=["web", "article"])
for r in results:
print(f"[{r.metadata['source']}] {r.content[:100]}...")
Note that add_source accepts plugin-specific parameters (e.g., selector for HTML, language for video). The collector handles all retries and error logging internally.
Migration Guide for Developers
For developers migrating from earlier versions, the main API changes are:
AgentMemoryreplaces the oldMemoryStoreclass- All file paths must now be relative to
$AGENT_HOME/km. If you were using absolute paths in custom plugins, update them to use theagent_homeparameter - The knowledge collection plugins are separate PyPI extras (
km[web],km[video],km[articles])-install what you need
Potential Gotchas in v0.0.2
- Video collection requires
yt-dlpandffmpegbinaries in PATH - Article plugin uses
pandocfor EPUB conversion; if absent, it falls back to plain text extraction - Memory index upgrades are not automatic between minor versions-run
km-migrate indexafter upgrading
Roadmap
Looking ahead, the v0.1.0 roadmap includes multi-agent shared memory and temporal decay for entries. For now, v0.0.2 provides a solid foundation for applications that need to ingest web content, maintain a growing knowledge base, and retrieve it efficiently. The $AGENT_HOME shift ensures that this works equally well in a Docker container, on a Raspberry Pi, or in a cloud function.
Try it out: pip install knowledge-memory[web,video] and set AGENT_HOME to your working directory. The examples in the /plugins folder show how to extend the collector for custom content sources.
Comments
No comments yet. Start the discussion.