DEV Community

Knowledge-and-Memory-Management v0.0.2: Knowledge Collection & Memory Management

Knowledge Collection Module

The core addition in v0.0.2 is the Knowledge Collection module. It abstracts content ingestion into a unified pipeline with plugins for specific sources:

  • Web scraping: HTML and RSS
  • Video transcript extraction: via YouTube API or local file processing
  • Article parsing: supporting PDF, EPUB, and Markdown

Each plugin normalizes content into a chunked, timestamped structure that is passed directly to memory storage-no intermediate files are written by default.

Memory Management

Memory Management in v0.0.2 uses a vector-based index with optional persistent backends (SQLite, PostgreSQL, or Redis). Ingested knowledge is automatically embedded using a configurable model (default: all-MiniLM-L6-v2) and stored with metadata tags.

The system supports:

  • Automatic deduplication via content hashing
  • Hybrid retrieval mechanism combining vector similarity with keyword filters
  • New forget API for explicit removal of entries by ID or age, enabling control over memory capacity

Infrastructure: $AGENT_HOME Transition

The transition to $AGENT_HOME is the most impactful infrastructure change. Previously, the module hardcoded paths like /home/user/.km or C:\Users\.km. Now, all data directories (index files, plugin caches, config) are resolved at runtime relative to the KM_ROOT environment variable, which defaults to $AGENT_HOME/km. This makes containerized deployments and multi-user setups trivial-each agent instance automatically uses a separate, isolated directory.

Basic Workflow Example

The following code example demonstrates a basic workflow in v0.0.2: configuring an agent, collecting content from two sources, and querying memory.

from knowledge_memory import AgentMemory, KnowledgeCollector
import os

# Agent home is automatically resolved from KM_ROOT or $AGENT_HOME
agent_home = os.environ.get("AGENT_HOME", "/tmp/agent")
km = AgentMemory(home=agent_home)

# Initialize collector with source-specific options
collector = KnowledgeCollector(memory=km)
collector.add_source("web", url="https://example.com/report", selector="article")
collector.add_source("video", url="https://youtube.com/watch?v=abc123", language="en")

# Run ingestion (extracts, chunks, and stores in memory)
collector.run()

# Query memory with vector + keyword filter
results = km.query("latest findings from report", top_k=3, tags=["web", "article"])
for r in results:
    print(f"[{r.metadata['source']}] {r.content[:100]}...")

Note that add_source accepts plugin-specific parameters (e.g., selector for HTML, language for video). The collector handles all retries and error logging internally.

Migration Guide for Developers

For developers migrating from earlier versions, the main API changes are:

  • AgentMemory replaces the old MemoryStore class
  • All file paths must now be relative to $AGENT_HOME/km. If you were using absolute paths in custom plugins, update them to use the agent_home parameter
  • The knowledge collection plugins are separate PyPI extras (km[web], km[video], km[articles])-install what you need

Potential Gotchas in v0.0.2

  • Video collection requires yt-dlp and ffmpeg binaries in PATH
  • Article plugin uses pandoc for EPUB conversion; if absent, it falls back to plain text extraction
  • Memory index upgrades are not automatic between minor versions-run km-migrate index after upgrading

Roadmap

Looking ahead, the v0.1.0 roadmap includes multi-agent shared memory and temporal decay for entries. For now, v0.0.2 provides a solid foundation for applications that need to ingest web content, maintain a growing knowledge base, and retrieve it efficiently. The $AGENT_HOME shift ensures that this works equally well in a Docker container, on a Raspberry Pi, or in a cloud function.

Try it out: pip install knowledge-memory[web,video] and set AGENT_HOME to your working directory. The examples in the /plugins folder show how to extend the collector for custom content sources.

Comments

No comments yet. Start the discussion.