Press Ctrl+D to draw

Drawing Tools

Log in for saved annotations

1px

1.5  Rational Agents and their Environments

This section follows the definitions and framing in Russell and Norvig, 2020, chapter 2, and connects them to engineering design decisions. The core idea is that we design agents by specifying their environments and then choosing the simplest program that behaves rationally within those constraints.

Definition 1.1  Agent

An agent is anything that perceives its environment through sensors and acts upon that environment through actuators.

This definition is deliberately broad. A thermostat and a warehouse robot both qualify as agents (see definition 1.1); the difference is the richness of their percepts and actions.

Definition 1.2  Percept

A percept is the agent's immediate sensory input at a given time.

Definition 1.3  Percept Sequence

A percept sequence is the complete history of everything an agent has perceived so far.

The percept sequence matters because rational action depends on history, not just the most recent snapshot (see definition 1.3).

Definition 1.4  Agent Function

The agent function is a mapping from percept sequences to actions. It defines what the agent does, independent of how it is implemented.

Definition 1.5  Agent Program

An agent program is a concrete implementation of the agent function on a particular agent architecture (hardware and software). We further require that the agent program takes as input only the current percept, not the full percept sequence.

The second requirement implies that the agent program must maintain internal state to access the full percept sequence if needed. This is a practical constraint for implementation.

In engineering practice, we often start with the agent program because it is implementable, but we should still be able to describe the underlying agent function it approximates (see definition 1.4).

Definition 1.6  Performance Measure

A performance measure defines what counts as success for the agent in its task environment, over time.

Definition 1.7  Rational Agent

A rational agent chooses the action that is expected to maximize its performance measure, given the percept sequence and any built-in knowledge.

Rationality is always evaluated relative to the performance measure (see definition 1.6) and the information available to the agent.

Definition 1.8  Autonomy

Autonomy refers to the degree to which an agent's behavior depends on its own experience rather than on built-in knowledge of the environment.

1.5.1 Task Environments and PEAS

Russell and Norvig emphasize describing the task environment before designing an agent program.

Definition 1.9  Task Environment

A task environment is the problem setting in which an agent operates, specified by the performance measure, environment, actuators, and sensors.

Definition 1.10  PEAS

PEAS stands for Performance measure, Environment, Actuators, and Sensors. It is a structured way to specify a task environment.

PEAS is the bridge between abstract definitions and implementable designs.

Design implication: if any PEAS component is vague, the agent's behavior will be underspecified and difficult to evaluate.

1.5.2 Properties of Task Environments

Task environment properties tell you what kind of agent program is feasible. The agent's environment is only one element of the task environment, alongside the performance measure, actuators, and sensors (see definition 1.9).

Definition 1.11  Observability

A task environment is fully observable if the agent's sensors give it complete access to the state; if it has incomplete access, it is partially observable; otherwise it is unobservable.

Example: In chess, the full board is visible, so the task environment is fully observable. In a warehouse, a robot only sees nearby aisles, so the task environment is partially observable; if sensors are down, it can be effectively unobservable.

Definition 1.12  Deterministic and Stochastic

A task environment is deterministic if the next state is fully determined by the current state and action; otherwise it is stochastic.

Example: In a deterministic grid world, an agent moves to its intended cell every time. In a stochastic grid world, an agent sometimes slips into an unintended neighboring cell.

Definition 1.13  Episodic and Sequential

In an episodic task environment, each decision is independent; in a sequential task environment, current actions affect future decisions.

Example: Image classification is usually episodic because each image is independent. Route planning is sequential because a wrong turn changes what actions are possible later.

Definition 1.14  Static and Dynamic

A task environment is static if it does not change while the agent deliberates; it is dynamic if it does.

Example: A crossword puzzle is static because the clues do not change while you think. Driving in traffic is dynamic because other vehicles move while you decide.

Definition 1.15  Discrete and Continuous

A task environment is continuous if between any pair of states, moments, percepts, or actions there are infinitely many intermediate states, moments, actions, or percepts; otherwise it is discrete.

Example: A robot arm moving in space operates in a continuous task environment. Checkers is discrete because between any two positions a piece can occupy, there are finitely many other positions.

Together, these categories determine what the agent can reasonably represent and compute. They are design constraints that shape algorithms, memory, and testing strategies.

1.5.3 Designing the Agent Program

These definitions tell us what an agent is; design tells us how to build one. The workflow below aligns directly with the definitions:

  1. Specify the task environment with PEAS (see definition 1.10).
  2. Classify the task environment properties to determine constraints (see Section 1.5.3: Properties of Task Environments).
  3. Choose the simplest agent program that can behave rationally under those constraints (see definition 1.7).

For example, a fully observable, deterministic, episodic task often admits a simple reflex agent. A partially observable, sequential, dynamic task usually requires internal state, planning, or learning. Rationality does not mean perfection; it means the best expected outcome given the percept sequence, built-in knowledge, and computational limits.

1.5.4 Agent Structures

Russell and Norvig describe agent structure as the combination of an agent program and an agent architecture, with the program implementing the agent function in a concrete way Russell and Norvig, 2020, ยง 2.4. The subsections below summarize the main agent types and design tradeoffs.

1.5.4.1 Agent Programs

An agent program takes current percepts as input and produces an action as output. One may be tempted to structure the program as a lookup table that maps percept sequences to actions. However, this is impractical for all but the simplest discrete environments due to combinatorial explosion: the number of possible percept sequences grows at least exponentially with time.

This implies that agent programs require a structure and strategy for handling the massive space of possible percept sequences. Common strategies include the following structures.

1.5.4.2 Simple Reflex Agents

Simple reflex agents select actions based only on the current percept, using condition-action rules. They ignore history and do not maintain internal state.

For instance, a thermostat that turns the heater on if the temperature is below a threshold and off otherwise is a simple reflex agent.

These agents can be rational in fully observable, episodic environments where the right action depends only on the present state. They fail in partially observable or dynamic settings because they cannot infer hidden state or remember previous events.

1.5.4.3 Model-Based Reflex Agents

Model-based reflex agents augment reflex rules with an internal state that summarizes the percept history. They use a model of how the world evolves to update that state over time.

For example, a robot vacuum that keeps track of which areas have been cleaned and which remain dirty uses internal state to inform its actions beyond the current sensor readings.

This architecture handles partial observability by maintaining beliefs about the current state. The quality of the agent depends on the accuracy of its model and state update rules.

1.5.4.4 Goal-Based Agents

Goal-based agents choose actions that achieve a specified goal. This requires search or planning: the agent evaluates possible action sequences and selects one that leads to a goal state.

For instance, a soccer-playing robot that plans a sequence of moves to score a goal is a goal-based agent.

Goals make behavior more flexible than reflex rules, but they increase computational cost. The same environment may admit many goal-achieving actions, and the agent needs a strategy for selecting among them.

1.5.4.5 Utility-Based Agents

Utility-based agents go beyond goals by ranking outcomes with a utility function. This supports tradeoffs among competing objectives, such as speed versus energy or accuracy versus cost.

For example, an autonomous car that balances safety, speed, and passenger comfort uses a utility function to evaluate different driving strategies.

In uncertain environments, a rational utility-based agent chooses the action that maximizes expected utility, which requires estimating outcome probabilities and utilities.

The concept of alignment in AI safety relates closely to utility-based agents: ensuring that the agent's utility function reflects human values and desired outcomes.

1.5.4.6 Learning Agents

Learning agents improve over time by adapting their behavior based on experience. Russell and Norvig describe four components:

  • Performance element: selects actions
  • Learning element: modifies the performance element to improve behavior
  • Critic: evaluates performance against a measure
  • Problem generator: proposes exploratory actions to gain information

Learning is not a separate agent type; it is a capability that can be added to reflex, model-based, goal-based, or utility-based agents.

1.5.4.7 State Representations in Agent Programs

There are three common ways to represent state in agent programs:

  1. Atomic representations: treat each state as an indivisible unit. Simple but limited; does not capture relationships between states.
  2. Factored representations: describe states using sets of attributes or variables. More expressive; allows reasoning about parts of the state independently.
  3. Structured representations: use objects and relations to model complex states. Most expressive; supports rich reasoning but is computationally intensive.

The choice of representation affects the agent's ability to reason, plan, and learn. More complex representations enable sophisticated behavior but require more computation and memory.

Bibliography

  1. [AI] Russell, Stuart J. and Peter Norvig. Artificial intelligence: a modern approach. (2020) 4 ed. Prentice Hall. http://aima.cs.berkeley.edu/