Glossary
AI engineering glossary
Plain-English definitions of the terms behind production AI — no fluff, written for engineers and the leaders who fund them.
- Agent observability
- Agent observability is the tooling that makes an agent’s behavior in production visible — tracing tool calls, prompts, costs, and failures.
- Agentic development
- Agentic development is the practice of delegating software tasks to AI agents that plan, edit, and verify code across a codebase, with human oversight.
- AI agent
- An AI agent is a system where a language model decides and takes actions through tools in a loop to accomplish a goal, rather than producing a single response.
- AI code security
- AI code security is the practice of controlling the risks of AI-generated code and prompts — vulnerabilities, license issues, data leakage, and prompt injection.
- Codebase readiness
- Codebase readiness is how well a codebase supports AI/agentic development — measured by module boundaries, tests, types, docs, and context files.
- Context window
- A context window is the maximum amount of text (measured in tokens) a model can consider in a single request, including both input and output.
- Embeddings
- Embeddings are numeric vector representations of text (or other data) that place similar meanings close together, enabling semantic search and RAG.
- Fine-tuning
- Fine-tuning further trains a base model on your data to adapt its behavior, format, or style for a specific task.
- Function calling (tool use)
- Function calling is a model capability that lets it request a structured tool call, which your code executes and returns results for.
- Hallucination
- A hallucination is when a model produces confident, plausible-sounding output that is factually wrong or unsupported.
- LLM cost optimization
- LLM cost optimization is the practice of reducing what production AI features cost — through routing, caching, batching, and right-sizing models — without losing quality.
- LLM evals
- LLM evals are systematic tests that measure the quality of a model’s outputs against defined criteria, so changes can be validated instead of guessed.
- LLM gateway
- An LLM gateway is a layer that sits between your application and model providers to handle routing, caching, fallbacks, observability, and cost control.
- LLM-as-judge
- LLM-as-judge is an evaluation technique where a language model scores another model’s output against criteria you define.
- Model Context Protocol (MCP)
- MCP is an open protocol that standardizes how applications expose tools, data, and prompts to AI models and agents.
- Model routing
- Model routing sends each request to the most appropriate model — by cost, quality, or latency — instead of using one model for everything.
- Prompt caching
- Prompt caching reuses the model’s processing of a repeated prompt prefix, cutting cost and latency on requests that share a large, stable preamble.
- Prompt engineering
- Prompt engineering is the practice of designing the instructions and context given to a language model to get reliable, high-quality outputs.
- Regression testing (for LLMs)
- LLM regression testing re-runs a suite of evals on every change so a prompt or model update can’t silently degrade quality.
- Retrieval-augmented generation (RAG)
- RAG is a technique that retrieves relevant documents at query time and adds them to the prompt so the model answers from up-to-date, specific knowledge.
- Self-hosted LLM
- A self-hosted LLM is an open-weight model you run on your own infrastructure, so data never leaves your environment.
- Semantic search
- Semantic search retrieves results by meaning rather than exact keywords, using embeddings to match intent.
- Structured outputs
- Structured outputs constrain a model’s response to a defined schema (such as JSON), guaranteeing parseable, valid output.
- Tokens
- Tokens are the chunks of text a language model reads and writes; pricing and context limits are measured in them, not words or characters.
- Vector database
- A vector database stores embeddings and finds the nearest vectors to a query efficiently, powering semantic search and RAG.