DEV Community
Grade 10
3d ago
BoxAgnts Tool System (3) — The Complete Chain of Tool Registration and Hot Reloading
Tool registration sounds like a lightweight module — scan directories, read files, fill a hash table. But doing it right and doing it reliably requires handling encoding detection, text parsing, race conditions, and startup performance — problems that aren't obvious at first glance. This article traces the complete chain from a .wasm file to an AI-callable tool, breaking down each step. The Problems Registration Must Solve Let's be clear about what this module needs to accomplish. Once a .wasm file is placed in the extensions directory, the system needs to know: What its name is What parameters it has, their types, and whether each is required What permission level it belongs to What its functional description is (for AI model call decisions) What its keywords are (for AI model search) The traditional approach is to have the developer provide a JSON Schema file alongside the .wasm . This approach has synchronization problems: the Schema says a parameter is string but the code treats it as number ; the Schema wasn't updated but the tool already gained new parameters; the Schema has errors but the tool registers successfully and then fails forever on execution. Plus, to prepare this Schema, the developer has to additionally understand BoxAgnts' Schema format. BoxAgnts' approach changes the Schema source from "manually written" to "tool self-described" — directly execute the WASM tool, pass --help , and parse the help text it prints. This means tool developers only need to follow standard CLI program conventions, using any language's CLI argument parsing library (Rust's clap, Go's cobra, Python's argparse) to define parameters, and BoxAgnts extracts everything automatically. Encoding Detection The first technical detail comes after reading stdout. A WASM tool's --help output is a byte stream, not a string — you need to detect the encoding before decoding. If you assume UTF-8 blindly, tools encoded in GBK or Shift-JIS will fail to parse. BoxAgnts uses chardetng for encoding detection: // wasm-tools/src/decode.rs pub fn decode_bytes ( bytes : Bytes ) -> ( String , & 'static str , bool ) { let mut detector = chardetng :: EncodingDetector :: new ( chardetng :: Iso2022JpDetection :: Allow ); detector .feed ( & bytes , true ); let encoding = detector .guess ( None , chardetng :: Utf8Detection :: Allow ); let ( cow , _ , had_errors ) = encoding .decode ( & bytes ); ( cow .into_owned (), encoding .name (), had_errors ) } chardetng is an encoding detection library developed by Mozilla, used by Firefox for automatic webpage encoding detection. It has very high accuracy for short texts ( --help output is typically no more than a few KB). Iso2022JpDetection::Allow enables ISO-2022-JP detection for WASM tools from Japanese environments; Utf8Detection::Allow validates UTF-8 integrity to avoid misclassifying random binary data as valid text. After decoding, three items are returned: the string, the encoding name, and whether there were decoding errors. The subsequent parser receives clean UTF-8 text. The Help Text Parser Parsing --help output is not straightforward. Different CLI libraries produce output in different formats: clap's --help and -h differ in detail level (the former includes long_about , the latter only about ); some libraries have inconsistent indentation between Options: and Arguments: blocks; subcommands may appear under either Commands: or Subcommands: headings. BoxAgnts' parser, located in wasm-tools/src/registry/parser.rs , follows this flow: 1. Fetch Two Help Texts pub async fn fetch_help_texts ( program : & str ) -> Result < HelpTextPair > { let short_candidates = vec! [ vec! [ "-h" ], vec! [ "--help" ]]; let long_candidates = vec! [ vec! [ "--help" ], vec! [ "-h" ]]; let short_help = run_first_help_candidate ( program , & short_candidates ) .await ? ; let long_help = run_first_help_candidate ( program , & long_candidates ) .await ? ; Ok ( HelpTextPair { short_help , long_help }) } Why two copies? Because many CLI programs produce different output for -h (short help) and --help (long help). -h may only list parameter names with one-line descriptions, while --help includes more detailed long descriptions ( long_about ). BoxAgnts merges both: Tool name and version extracted from short help (most compact and reliable format) Long description ( long_about ), keywords ( Keywords: ), and permission level ( PermissionLevel: ) taken preferentially from long help Parameter list ( properties ) and required items ( required ) merged from both — long help as primary, short help as supplementary 2. Validate Output Legitimacy Not every WASM program qualifies as a tool. run_first_help_candidate performs legitimacy checks after receiving output: pub fn looks_like_help_output ( text : & str ) -> bool { let has_usage = text .lines () .any (| l | l .trim_start () .starts_with ( "Usage:" )); let has_options = text .lines () .any (| l | l .trim () == "Options:" ); let has_arguments = text .lines () .any (| l | l .trim () =
Tool registration sounds like a lightweight module — scan directories, read files, fill a hash table. But doing it right and doing it reliably requires handling encoding detection, text parsing, race conditions, and startup performance — problems that aren't obvious at first glance. This article traces the complete chain from a .wasm file to an AI-callable tool, breaking down each step. The Problems Registration Must Solve Let's be clear about what this module needs to accomplish. Once a .wasm file is placed in the extensions directory, the system needs to know: - What its name is - What parameters it has, their types, and whether each is required - What permission level it belongs to - What its functional description is (for AI model call decisions) - What its keywords are (for AI model search) The traditional approach is to have the developer provide a JSON Schema file alongside the .wasm . This approach has synchronization problems: the Schema says a parameter is string but the code treats it as number ; the Schema wasn't updated but the tool already gained new parameters; the Schema has errors but the tool registers successfully and then fails forever on execution. Plus, to prepare this Schema, the developer has to additionally understand BoxAgnts' Schema format. BoxAgnts' approach changes the Schema source from "manually written" to "tool self-described" — directly execute the WASM tool, pass --help , and parse the help text it prints. This means tool developers only need to follow standard CLI program conventions, using any language's CLI argument parsing library (Rust's clap, Go's cobra, Python's argparse) to define parameters, and BoxAgnts extracts everything automatically. Encoding Detection The first technical detail comes after reading stdout. A WASM tool's --help output is a byte stream, not a string — you need to detect the encoding before decoding. If you assume UTF-8 blindly, tools encoded in GBK or Shift-JIS will fail to parse. BoxAgnts uses chardetng for encoding detection: // wasm-tools/src/decode.rs pub fn decode_bytes(bytes: Bytes) -> (String, &'static str, bool) { let mut detector = chardetng::EncodingDetector::new( chardetng::Iso2022JpDetection::Allow ); detector.feed(&bytes, true); let encoding = detector.guess(None, chardetng::Utf8Detection::Allow); let (cow, _, had_errors) = encoding.decode(&bytes); (cow.into_owned(), encoding.name(), had_errors) } chardetng is an encoding detection library developed by Mozilla, used by Firefox for automatic webpage encoding detection. It has very high accuracy for short texts (--help output is typically no more than a few KB). Iso2022JpDetection::Allow enables ISO-2022-JP detection for WASM tools from Japanese environments; Utf8Detection::Allow validates UTF-8 integrity to avoid misclassifying random binary data as valid text. After decoding, three items are returned: the string, the encoding name, and whether there were decoding errors. The subsequent parser receives clean UTF-8 text. The Help Text Parser Parsing --help output is not straightforward. Different CLI libraries produce output in different formats: clap's --help and -h differ in detail level (the former includes long_about , the latter only about ); some libraries have inconsistent indentation between Options: and Arguments: blocks; subcommands may appear under either Commands: or Subcommands: headings. BoxAgnts' parser, located in wasm-tools/src/registry/parser.rs , follows this flow: 1. Fetch Two Help Texts pub async fn fetch_help_texts(program: &str) -> Result { let short_candidates = vec![vec!["-h"], vec!["--help"]]; let long_candidates = vec![vec!["--help"], vec!["-h"]]; let short_help = run_first_help_candidate(program, &short_candidates).await?; let long_help = run_first_help_candidate(program, &long_candidates).await?; Ok(HelpTextPair { short_help, long_help }) } Why two copies? Because many CLI programs produce different output for -h (short help) and --help (long help). -h may only list parameter names with one-line descriptions, while --help includes more detailed long descriptions (long_about ). BoxAgnts merges both: - Tool name and version extracted from short help (most compact and reliable format) - Long description ( long_about ), keywords (Keywords: ), and permission level (PermissionLevel: ) taken preferentially from long help - Parameter list ( properties ) and required items (required ) merged from both — long help as primary, short help as supplementary 2. Validate Output Legitimacy Not every WASM program qualifies as a tool. run_first_help_candidate performs legitimacy checks after receiving output: pub fn looks_like_help_output(text: &str) -> bool { let has_usage = text.lines().any(|l| l.trim_start().starts_with("Usage:")); let has_options = text.lines().any(|l| l.trim() == "Options:"); let has_arguments = text.lines().any(|l| l.trim() == "Arguments:"); let has_commands = text.lines().any(|l| { let t = l.trim(); t == "Commands:" || t == "Subcommands:" }); has_usage || has_options || has_arguments || has_commands } The output must contain at least one of Usage: , Options: , Arguments: , or Commands: block headers. If a WASM program's --help output doesn't include these — for example, if it's an HTTP server rather than a CLI tool — the parser rejects registration and logs an error. 3. Field-by-Field Extraction fn parse_help_text(help: &str) -> Result { let lines: Vec = help.lines().collect(); let (name, version) = parse_name_version(lines[0])?; // First line format: "base64 1.0.0" → name="base64", version="1.0.0" let about = lines.iter().skip(1) .find(|l| !l.trim().is_empty()) .ok_or("missing about line")? .trim().to_string(); let keywords = extract_single_line_field(help, "Keywords:"); let permission_level = extract_single_line_field(help, "PermissionLevel:"); let properties = parse_options_section(&lines)?; // Options: block let (arg_props, arg_required) = parse_arguments_section(&lines)?; // Arguments: block let commands = parse_commands_section(&lines)?; // Commands: block // ... } The core of parameter parsing lies in two functions: parse_options_section : Locate theOptions: line; each subsequent line is an option definition (in--mode or-m, --mode format). Extract parameter name, type (from ), and description (free text at end of line).parse_arguments_section : Locate theArguments: line; positional parameters in format, with square brackets indicating optional. Both functions use regex matching. The former's pattern is --([a-zA-Z][a-zA-Z0-9_-]*) with optional angle brackets; the latter matches and determines optionality from the presence of [ around it. 4. Merge and Deduplicate merge_required combines the required parameter lists extracted from -h and --help : fn merge_required(short: &[String], long: &[String]) -> Vec { let mut merged = Vec::new(); for item in short.iter().chain(long.iter()) { if !merged.contains(item) { merged.push(item.clone()); } } merged } Similarly, properties from both sources are merged — long help's entries override short help's same-named entries (since long help descriptions are more detailed). The final product is ToolSpec : pub struct ToolSpec { pub name: String, pub wasm_file: String, pub about: String, pub long_about: String, pub keywords: String, pub permission_level: String, pub version: String, pub input_schema: InputSchema, // type: "object" + properties + required pub commands: Vec , } Hot Reloading and Concurrency Safety Tool registration isn't a one-time thing. Users may add, overwrite, or delete .wasm files in the extensions directory at any time. BoxAgnts uses the notify crate for filesystem monitoring: let _ = start_watcher(workspace_extensions_dir.join("tools")).await; let _ = start_watcher(app_extensions_dir.join("tools")).await; start_watcher internally creates a tokio task that loops, receiving filesystem events. The handling logic for arriving events looks like this: notify::Event::Create(path) | Event::Modify(path) │ path ends with .wasm? ├── Yes → execute wasm-sandbox::run::execute(path, ["--help"]) → parse → update HashMap └── No → ignore notify::Event::Remove(path) │ path ends with .wasm? ├── Yes → HashMap.remove(tool_name) └── No → ignore The HashMap itself is protected by tokio::sync::RwLock : static WASM_TOOLS: Lazy >> = Lazy::new(|| RwLock::new(HashMap::new())); RwLock allows multiple concurrent reads (tool invocations) and one exclusive write (hot-reload updates). Since tool list update frequency is very low (writes are almost exclusively triggered by manual user operations), read-write lock contention costs are negligible. An edge case: what happens if, while the file watcher is parsing a new tool, an AI conversation happens to request the tool list? The answer is that no special handling is needed — all_tools() holds an RwLock read lock, the parser needs a write lock, and the write lock waits for the read lock to release. From the user's perspective, the delay is imperceptible — all_tools() 's read lock hold time is merely the duration of one HashMap traversal (microsecond scale), causing no noticeable blocking. Compilation Caching There's an implicit performance optimization during registration. The first time a .wasm file is encountered, parse_wasm_tool() not only executes it in the sandbox to capture --help output, but also triggers Wasmtime precompilation: // compiler.rs pub fn process(wasm_file: &str, cache_dir: &str) -> Result { let cache_file = dir.join(cache_file_name); if cache_file.exists() { return Ok(cache_file); // cache hit } // Wasmtime CodeBuilder compilation, outputs .cwasm let output_bytes = code.compile_component_serialized()?; std::fs::write(&cache_file, output_bytes)?; Ok(cache_file) } .cwasm is Wasmtime's precompiled format (compiled WebAssembly). Subsequent actual tool invocations load it directly, skipping the parsing and compilation phases. For larger WASM tools (e.g., sqlite-component.wasm , which includes a SQLite engine and can produce .cwasm files several MB in size), this cac
Comments
No comments yet. Start the discussion.