Question 1

What is Model Context Protocol (MCP)?

Accepted Answer

MCP is an open protocol that standardizes how applications expose tools, data, and prompts to AI models and agents.

Question 2

What is LLM evals?

Accepted Answer

LLM evals are systematic tests that measure the quality of a model’s outputs against defined criteria, so changes can be validated instead of guessed.

Question 3

What is LLM-as-judge?

Accepted Answer

LLM-as-judge is an evaluation technique where a language model scores another model’s output against criteria you define.

Question 4

What is Retrieval-augmented generation (RAG)?

Accepted Answer

RAG is a technique that retrieves relevant documents at query time and adds them to the prompt so the model answers from up-to-date, specific knowledge.

Question 5

What is Embeddings?

Accepted Answer

Embeddings are numeric vector representations of text (or other data) that place similar meanings close together, enabling semantic search and RAG.

Question 6

What is Vector database?

Accepted Answer

A vector database stores embeddings and finds the nearest vectors to a query efficiently, powering semantic search and RAG.

Question 7

What is Semantic search?

Accepted Answer

Semantic search retrieves results by meaning rather than exact keywords, using embeddings to match intent.

Question 8

What is AI agent?

Accepted Answer

An AI agent is a system where a language model decides and takes actions through tools in a loop to accomplish a goal, rather than producing a single response.

Question 9

What is Agentic development?

Accepted Answer

Agentic development is the practice of delegating software tasks to AI agents that plan, edit, and verify code across a codebase, with human oversight.

Question 10

What is Codebase readiness?

Accepted Answer

Codebase readiness is how well a codebase supports AI/agentic development — measured by module boundaries, tests, types, docs, and context files.

Question 11

What is Prompt engineering?

Accepted Answer

Prompt engineering is the practice of designing the instructions and context given to a language model to get reliable, high-quality outputs.

Question 12

What is Prompt caching?

Accepted Answer

Prompt caching reuses the model’s processing of a repeated prompt prefix, cutting cost and latency on requests that share a large, stable preamble.

Question 13

What is Context window?

Accepted Answer

A context window is the maximum amount of text (measured in tokens) a model can consider in a single request, including both input and output.

Question 14

What is Tokens?

Accepted Answer

Tokens are the chunks of text a language model reads and writes; pricing and context limits are measured in them, not words or characters.

Question 15

What is Hallucination?

Accepted Answer

A hallucination is when a model produces confident, plausible-sounding output that is factually wrong or unsupported.

Question 16

What is Fine-tuning?

Accepted Answer

Fine-tuning further trains a base model on your data to adapt its behavior, format, or style for a specific task.

Question 17

What is Model routing?

Accepted Answer

Model routing sends each request to the most appropriate model — by cost, quality, or latency — instead of using one model for everything.

Question 18

What is LLM gateway?

Accepted Answer

An LLM gateway is a layer that sits between your application and model providers to handle routing, caching, fallbacks, observability, and cost control.

Question 19

What is Agent observability?

Accepted Answer

Agent observability is the tooling that makes an agent’s behavior in production visible — tracing tool calls, prompts, costs, and failures.

Question 20

What is Regression testing (for LLMs)?

Accepted Answer

LLM regression testing re-runs a suite of evals on every change so a prompt or model update can’t silently degrade quality.

Question 21

What is Function calling (tool use)?

Accepted Answer

Function calling is a model capability that lets it request a structured tool call, which your code executes and returns results for.

Question 22

What is Structured outputs?

Accepted Answer

Structured outputs constrain a model’s response to a defined schema (such as JSON), guaranteeing parseable, valid output.

Question 23

What is LLM cost optimization?

Accepted Answer

LLM cost optimization is the practice of reducing what production AI features cost — through routing, caching, batching, and right-sizing models — without losing quality.

Question 24

What is AI code security?

Accepted Answer

AI code security is the practice of controlling the risks of AI-generated code and prompts — vulnerabilities, license issues, data leakage, and prompt injection.

Question 25

What is Self-hosted LLM?

Accepted Answer

A self-hosted LLM is an open-weight model you run on your own infrastructure, so data never leaves your environment.

AI engineering glossary