A curated, opinionated list of principles, standards, and technologies for building agentic AI systems.
Agentic AI refers to systems built around goal-directed agents rather than single-shot generation. These agents plan over time, use external tools and APIs, maintain internal state and memory, interact with humans or other agents, and carry out multi-step processes with observable outcomes.
This list focuses on the abstractions and infrastructure that make such systems possible, from low-level model formats to high-level control and observability primitives.
- Platforms & Frameworks
- AI Infrastructure & Compute
- Standards & Specifications
- Language Models
- State, Retrieval & Coordination Infrastructure
- Evaluation, Observability & Safety
- Principles
- Theory
- Design
- Hardware Accelerators
End-to-end stacks and core frameworks for agentic systems.
You run the full agent runtime yourself. No managed orchestration backend.
-
Akka - Actor-based platform for building distributed, fault-tolerant agent systems. Strong fit for long-running, concurrent, and highly reliable agents. Languages: Scala, Java.
-
Any Agent - A single interface to use and evaluate different agent frameworks. Language: Python.
-
Dspy - Declarative framework for building modular AI software. Language: Python.
-
Mastra - TypeScript-first framework for building agentic applications with explicit workflows, memory, evaluations, and tool integration. Language: TypeScript.
-
Pydantic - Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI. Language: Python.
-
Ray - Unified framework for scaling AI and Python applications. Languages: C++, Java, Python.
-
Rig - Rust-first framework for building LLM-powered agents with strong typing, modular tools, and composable workflows. Emphasizes performance, safety, and production-grade systems. Language: Rust.
-
smolagents - Minimal, lightweight framework for building simple and transparent LLM agents with a strong emphasis on readability, hackability, and low abstraction overhead. Designed for learning, prototyping, and small production systems. Language: Python.
-
Volt Agent - AI agent platform built on an open-source TypeScript agent framework. Language: TypeScript.
You run agent code locally or in your cloud, but rely on a managed agent runtime / orchestration backend.
-
AWS Bedrock Agents - Managed service for building, orchestrating, and operating AI agents tightly integrated with AWS services and Bedrock models. Language: JSON / SDK-driven (Python, Java, etc). Deployment: AWS-managed service.
-
Camel - LLM-powered multi-agent framework enabling agents to play roles, collaborate, and coordinate tasks in complex workflows. Ideal for experimentation with multi-agent interaction patterns. Language: Python. Deployment: Local / Cloud / Containerizable.
-
Google Agent Development Kit (ADK) - An open-source, code-first toolkit for defining agents, tools, workflows, and multi-agent systems with built-in debugging, execution tracing, and extensibility. Designed to integrate with Vertex AI Agent Engine while remaining model-agnostic. Languages: Python, TypeScript, Go, Java. Deployment: Local / Your infrastructure + Vertex AI Agent Engine.
-
Microsoft Agent Framework - A framework for building, orchestrating and deploying AI agents and multi-agent workflows. Languages: Python, C# (.NET). Deployment: Local / Azure-hosted + Azure AI Agent Service.
-
OpenAI Agents SDK - OpenAI's SDK and platform for building, orchestrating, and deploying agentic workflows with structured tool integration, observability, guardrails, and evaluation features on top of the Responses API. Languages: Python, TypeScript, Go. Deployment: Local / Your infrastructure + OpenAI-managed backend.
Platforms providing compute, GPU resources, and isolation for running AI workloads and agents at scale. Not specific agent SDKs, but critical for production deployments.
- Akash - Decentralized cloud platform for deploying and managing containerized applications.
- AWS EC2 / Bedrock + GPU - Cloud compute infrastructure for AI workloads with GPU acceleration, networking isolation, and integration with other AWS services.
- Blaxel - Infrastructure platform that gives agents sandboxed compute environments to run AI code, background tasks and tool calls.
- Daytona - Secure, scalable execution infrastructure and runtime for agentic workflows and AI‑generated code.
- E2B - Open-source runtime infrastructure for AI agents and apps, providing secure, isolated cloud sandboxes where agents can execute real code, use real tools, access files and networks, and perform long-running tasks.
- Google Cloud AI + Vertex AI - Managed AI compute infrastructure and orchestration for ML workloads, including GPUs/TPUs, secure isolation, and scaling.
- Modal - Managed AI compute infrastructure for running AI workloads at scale, with GPU acceleration, networking isolation, and integration with other services.
- Nebius - AI-native cloud platform with high-performance GPU clusters, managed infrastructure, observability, and deployment tools. Ideal for scaling agentic systems or large AI workloads in production.
Protocols and conventions enabling interoperability.
- Agent2Agent Protocol (A2A) - An open protocol enabling communication and interoperability between opaque agentic applications.
- Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities.
- Model Context Protocol (MCP) - An open-source standard for connecting AI applications to external systems.
- AP2 - Agent Payments Protocol (AP2) is an open protocol designed to enable secure, reliable, and interoperable agent commerce for developers, merchants, and the payments industry.
- MPP - Machine Payments Protocol (MPP) is an open protocol for machine-to-machine payments. Charge for API requests, tool calls, or content. Agents and apps pay per request in the same HTTP call.
- UCP - Universal Commerce Protocol (UCP) provides building blocks for agentic commerce across industries.
- x402 - An experimental open payment protocol that repurposes the dormant HTTP 402 status code to enable autonomous, on‑chain micropayments for APIs, services, and digital resources.
- Ollama - Lightweight, open-source LLM server for local or networked model serving.
- SGLang - High-performance serving framework for large language models and multimodal models.
- llama.cpp - Portable LLM inference in C/C++ used for efficient local inference.
- vLLM - High-throughput and memory-efficient inference and serving engine for LLMs.
- GGUF - Efficient, extensible binary format used with
llama.cppruntimes. - ONNX - Open standard for cross-runtime ML model representation.
- SafeTensors - Safe, fast tensor serialization standard.
- Axolotl - A free and open-source LLM fine-tuning framework.
- LlamaFactory - Meta-framework for efficient adaptation (LoRA, QLoRA).
- PEFT - Parameter-efficient fine-tuning methods.
- TRL - Tools for RLHF and other preference-based optimization.
- Unsloth - Fast LoRA-style fine-tuning stack.
- OpenEnv - Open, flexible multi-agent RL environment.
- LanceDB - Arrow-native versioned vector store suited for replay and dataset management.
- LlamaIndex - Framework for indexing and querying structured and unstructured data for LLMs. Often used as the semantic memory layer in agentic systems.
- Milvus - High-performance, cloud-native vector database built for scalable vector ANN search.
- pgvector - Embeddings within PostgreSQL for transactional, schema-backed vectors.
- Qdrant - Production-ready vector DB with payload support.
- Weaviate - Open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
- Conductor - Agentic workflow engine with native AI plan support and durable execution.
- Temporal - Durable workflow engine for orchestrating long-running agent processes.
Robust agentic systems require continuous evaluation, traceable behavior, and enforced safety constraints. These aspects are inseparable: observability enables evaluation, which enables safe autonomous operation.
Tracing & Observability
Log all decisions, tool calls, and state changes. Enable replay and auditing for debugging and regression testing.
Behavioral Evaluation
Test agent workflows, plans, and tool usage against task-specific metrics or rules. Include safety and constraint checks.
Replay & Regression
Deterministically replay historical agent runs to detect regressions or unintended behavior.
Automated Judging & Constraint Enforcement
Scale evaluations with rule-based or model-as-judge scoring. Enforce safety, cost, and correctness boundaries programmatically.
Sandboxing & Access Control
Agents should execute external actions (code, APIs, system operations) in controlled, permissioned environments. Apply least-privilege access, secrets management, and failure containment.
Jailbreak & Prompt Injection Mitigation
Protect against malicious inputs via model alignment, prompt filtering, or human-in-the-loop supervision.
- Maxim - End-to-end evaluation and observability platform.
- OpenAI Evals - Behavioral testing framework for multi-step workflows, including safety checks.
- Promptfoo - Compare prompts, models, and configurations with reproducible tests.
- Ragas - Evaluation toolkit for retrieval-augmented and multi-step agent behavior.
Foundational principles for building robust, auditable, and autonomous agentic systems:
Goal-Directed Control Loops
Agents operate in continuous perceive-reason-act cycles with built-in monitoring, failure detection, and corrective feedback, rather than one-shot generation.
Tool-First Reasoning
External tools, APIs, and executables are first-class components of reasoning, not post-processing steps.
Explicit, Versioned State
Plans, memory, and internal representations are structured, observable, and versioned to support durability, replay, and auditing.
Composable & Modular Architecture
Complex behavior emerges from coordinating specialized agents, skills, and workflows, not monolithic prompt chains.
Traceable & Evaluatable Behavior
All actions and decisions are logged, reproducible, and measurable to enable regression testing, auditing, and optimization.
Safety & Constraint Awareness
Agents operate within explicit safety, correctness, and resource constraints that bound autonomy and prevent catastrophic behavior.
-
STRIDE: A Systematic Framework for AI Modality Selection - A research framework that helps decide when to use agentic systems versus simple LLM calls or guided assistants, emphasizing dynamism, planning, and task suitability 🔸PDF.
-
Toward Safe and Responsible AI Agents (Three-Pillar Model) - A recent academic framework emphasizing transparency, accountability, and trustworthiness for responsible autonomous agents 🔸PDF.
-
A Survey of Small Language Models - A comprehensive survey on Small Language Models, focusing on their architectures, training techniques, and model compression techniques 🔸PDF.
-
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook - Aims to provide a unified and up-to-date (2 Apr 2026) landscape of latent space in language-based models.
-
L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression - L3TC is a low-complexity, learned lossless text compressor using an RWKV backbone that achieves 48% better compression than gzip with 50x fewer parameters. It provides megabytes-per-second decoding speeds.
Architectural principles and patterns for structuring agentic systems and coordinating planning, tools, memory, and multiple agents.
- Agentic AI Patterns - Catalog of architectural and workflow patterns for planning loops, tool orchestration, memory, and multi-agent coordination.
This section lists emerging and non-traditional AI hardware accelerators. The focus is on specialized architectures that depart from conventional GPUs/TPUs to target efficiency, stochastic computing, or brain-inspired models.
- Simplex Micro - RISC-V processor platform optimized for vector processing and edge AI.
-
Akida - Neuromorphic Neural Processing Units designed for ultra-low-power, event-driven AI inference at the edge. Akida implements spiking neural networks (SNNs) with on-chip learning and asynchronous processing, making them well-suited for always-on sensing, vision, and audio in embedded systems.
-
OpenNeuromorphic - A global community fostering education, research, and open-source collaboration in brain-inspired AI and hardware.
- Extropic - Thermodynamic Stochastic Processing Units focused on accelerating probabilistic and sampling-based workloads. Extropic's architecture leverages physical noise and thermodynamic principles to efficiently support Monte Carlo simulation, Bayesian inference, and optimization tasks.
- LiteX - An open-source SoC builder framework widely used to construct custom FPGA-based systems and attach open accelerator IP.
- NVDLA - A free and open architecture that promotes a standard way to design deep learning inference accelerators.
- OpenFPGA - An open-source FPGA framework for building custom reconfigurable fabrics and experimenting with new FPGA architectures and toolchains.