3.6 Building a Knowledge-Based Agent

We now have all of the theoretical tools: propositional logic for stating facts, inference rules for deriving conclusions, and satisfiability-based methods for efficient entailment checking. In this section we put them to work. We will build, line by line, a Python agent that uses a propositional knowledge base to navigate the Hazardous Warehouse safely, retrieve the package, and exit.

The agent uses two modules:

z3 — the Z3 SMT solver (De Moura and Bjørner, 2008), installed via pip install z3-solver
hazardous_warehouse_env.py — the Hazardous Warehouse environment from Section 3.2: The Hazardous Warehouse Environment

3.6.1 What Is Z3?

Z3 is an open-source SMT (Satisfiability Modulo Theories) solver developed by Microsoft Research. It is an industrial-strength tool used in software verification, security analysis, and AI research. It supports:

Propositional logic with native biconditionals (no manual CNF conversion)
First-order logic with quantifiers (\(\forall\), \(\exists\)) over user-defined sorts
Arithmetic, bit-vectors, arrays, and other theories

We will use Z3 as our reasoning engine throughout this chapter. In this section we use its propositional logic capabilities; in Section 3.7: Building a FOL Agent with Z3 we extend to first-order logic.

The Z3 Python package is included in the latest versions of the Section 1.3.9: Phase 2: Bootstrapping Your Repository repo. To manually install it, run:

pip install z3-solver

A quick example to verify the installation:

from z3 import Bool, Bools, Solver, And, Or, Not

P, Q = Bools('P Q')
s = Solver()
s.add(P == Q)    # Biconditional --- native, no CNF needed
s.add(P)
print(s.check())  # sat
print(s.model())   # [Q = True, P = True]

The == operator between Z3 booleans is a biconditional (\(\Leftrightarrow\)). Z3 handles CNF conversion, unit propagation, and satisfiability checking internally—we just state the constraints and ask questions.

3.6.2 Z3 as a Knowledge Base

3.6.2.1 TELL and ASK

Z3's Solver maintains a set of assertions. The add() method implements TELL from Section 3.1: Knowledge-Based Agents and the Limits of Search:

solver = Solver()
solver.add(P == Q)   # TELL: P <=> Q
solver.add(P)         # TELL: P is true

For ASK, we need entailment checking: does the KB entail \(\alpha\)? We use the refutation method—if \(\mathit{KB} \land \neg\alpha\) is unsatisfiable, then \(\alpha\) must follow from the KB. Z3's push() and pop() methods create and restore checkpoints on the assertion stack, making this clean:

from z3 import Not, unsat

def z3_entails(solver, query):
    """Check whether the solver's current assertions entail query."""
    solver.push()
    solver.add(Not(query))
    result = solver.check() == unsat
    solver.pop()
    return result

If check() returns unsat, no interpretation can make the KB true while making \(\alpha\) false—so \(\alpha\) must follow from the KB. The pop() restores the solver to its state before the query, leaving the KB unchanged.

3.6.2.2 Why Z3?

Z3 handles biconditionals natively (==), converts to CNF internally, and uses efficient SAT-solving techniques (including DPLL-style backtracking with unit propagation). We get the theoretical foundations from Section 3.3: Propositional Logic and Section 3.3.18: Resolution and Completeness without having to implement them ourselves.

3.6.3 Encoding the Warehouse

To use propositional logic, we need propositional symbols for every relevant fact about the warehouse. We adopt the following naming convention:

Propositional symbols for the Hazardous Warehouse
Symbol	Meaning
`D_x_y`	Damaged floor at \((x, y)\)
`F_x_y`	Forklift at \((x, y)\)
`C_x_y`	Creaking perceived at \((x, y)\)
`R_x_y`	Rumbling perceived at \((x, y)\)
`OK_x_y`	Square \((x, y)\) is safe to enter

We define helper functions that return Z3 Bool variables:

from z3 import Bool

def damaged(x, y):
    return Bool(f'D_{x}_{y}')

def forklift_at(x, y):
    return Bool(f'F_{x}_{y}')

def creaking_at(x, y):
    return Bool(f'C_{x}_{y}')

def rumbling_at(x, y):
    return Bool(f'R_{x}_{y}')

def safe(x, y):
    return Bool(f'OK_{x}_{y}')

For example, damaged(3, 1) returns the Z3 Bool variable D_3_1, and Not(damaged(3, 1)) returns its negation. Z3 interns variables by name: calling Bool('D_3_1') twice returns the same object, so these helpers can be called freely without creating duplicates.

We also need an adjacency helper. In the \(4 \times 4\) grid, a square's neighbors are the squares one step away in each cardinal direction (north, south, east, west), excluding squares outside the grid:

def get_adjacent(x, y, width=4, height=4):
    result = []
    for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
        nx, ny = x + dx, y + dy
        if 1 <= nx <= width and 1 <= ny <= height:
            result.append((nx, ny))
    return result

3.6.4 Encoding Warehouse Physics

Now we populate the knowledge base with the rules that govern the warehouse. These are the same rules we wrote in propositional logic in Section 3.3.10: A Propositional KB for the Hazardous Warehouse, but now expressed in Python using Z3.

3.6.4.1 Creaking Rules

For each square \((x, y)\), creaking is perceived there if and only if at least one adjacent square has damaged floor. As a biconditional:

\[C_{x,y} \Leftrightarrow (D_{a_1, b_1} \lor D_{a_2, b_2} \lor \cdots)\]

where \((a_i, b_i)\) are the squares adjacent to \((x, y)\).

In Z3, this is a single line per square:

adj = get_adjacent(2, 1)  # [(1,1), (3,1), (2,2)]
solver.add(creaking_at(2, 1) == Or([damaged(a, b) for a, b in adj]))
# C_2_1 == Or(D_1_1, D_3_1, D_2_2)

The == operator creates a biconditional directly. No associate helper, no string-based <=> operator, no manual CNF conversion.

3.6.4.2 Rumbling Rules

The rumbling rules are identical in structure, with the forklift playing the role of damaged floor:

\[R_{x,y} \Leftrightarrow (F_{a_1, b_1} \lor F_{a_2, b_2} \lor \cdots)\]

solver.add(rumbling_at(2, 1) == Or([forklift_at(a, b) for a, b in adj]))

3.6.4.3 Safety Rules

A square is safe if and only if it has no damaged floor and no forklift:

\[\mathit{OK}_{x,y} \Leftrightarrow (\neg D_{x,y} \land \neg F_{x,y})\]

solver.add(safe(2, 1) == And(Not(damaged(2, 1)), Not(forklift_at(2, 1))))

3.6.4.4 Putting It All Together

The function build_warehouse_kb creates a Solver and encodes all the rules for every square, plus the initial knowledge that \((1, 1)\) is safe:

from z3 import Solver, Or, And, Not

def build_warehouse_kb(width=4, height=4):
    solver = Solver()

    # The starting square is safe.
    solver.add(Not(damaged(1, 1)))
    solver.add(Not(forklift_at(1, 1)))

    for x in range(1, width + 1):
        for y in range(1, height + 1):
            adj = get_adjacent(x, y, width, height)

            # Creaking iff damaged adjacent
            solver.add(creaking_at(x, y) == Or([damaged(a, b) for a, b in adj]))

            # Rumbling iff forklift adjacent
            solver.add(rumbling_at(x, y) == Or([forklift_at(a, b) for a, b in adj]))

            # Safety rule
            solver.add(
                safe(x, y) == And(Not(damaged(x, y)), Not(forklift_at(x, y)))
            )

    return solver

Each biconditional reads almost exactly like the mathematical formula: \(C_{x,y} \Leftrightarrow (D_{a_1,b_1} \lor \cdots)\) becomes creaking_at(x, y) == Or([damaged(a, b) for ...]). Z3 handles the internal conversion to a form it can reason about efficiently.

After calling build_warehouse_kb(), the solver contains the complete physics of the warehouse. It knows nothing yet about what the robot has perceived—that comes next.

3.6.5 Telling Percepts

Each time the robot visits a square, it receives a Percept from the environment with boolean fields creaking and rumbling (among others). We translate these directly into Z3 assertions:

def tell_percepts(solver, percept, x, y):
    """TELL the solver the percepts observed at (x, y)."""
    if percept.creaking:
        solver.add(creaking_at(x, y))
    else:
        solver.add(Not(creaking_at(x, y)))
    if percept.rumbling:
        solver.add(rumbling_at(x, y))
    else:
        solver.add(Not(rumbling_at(x, y)))

Both the positive and negative cases matter. Telling the solver Not(creaking_at(2, 1)) (no creaking at \((2,1)\)) is just as important as telling it creaking_at(2, 1)—the absence of a percept is information.

3.6.6 Asking About Safety

With the physics encoded and percepts told, we can ASK the solver whether a square is safe:

z3_entails(solver, safe(2, 1))        # True if solver entails OK_2_1
z3_entails(solver, Not(safe(3, 1)))   # True if solver entails ~OK_3_1

There are three possible outcomes for any square:

Provably safe: z3_entails(solver, safe(x, y)) returns True. The agent can enter safely.
Provably dangerous: z3_entails(solver, Not(safe(x, y))) returns True. The agent must avoid it.
Unknown: Neither query returns True. The solver does not have enough information. The agent should be cautious and avoid the square until more evidence is available.

3.6.6.1 Manual Walkthrough

Let us trace through the first few steps of the example layout from Section 3.2: The Hazardous Warehouse Environment (damaged floor at \((3,1)\) and \((3,3)\), forklift at \((1,3)\), package at \((2,3)\)).

Step 1: At \((1,1)\), perceiving no creaking, no rumbling.

solver = build_warehouse_kb()
tell_percepts(solver, Percept(creaking=False, rumbling=False,
                              beacon=False, bump=False, beep=False), 1, 1)

# ASK about adjacent squares
print(z3_entails(solver, safe(2, 1)))  # True
print(z3_entails(solver, safe(1, 2)))  # True

No creaking at \((1,1)\) means no adjacent square has damaged floor. No rumbling means no adjacent square has the forklift. The solver derives that both \((2,1)\) and \((1,2)\) are safe—exactly the reasoning from Section 3.2: The Hazardous Warehouse Environment.

Step 2: Move to \((2,1)\), perceiving creaking but no rumbling.

tell_percepts(solver, Percept(creaking=True, rumbling=False,
                              beacon=False, bump=False, beep=False), 2, 1)

print(z3_entails(solver, safe(3, 1)))        # False (unknown)
print(z3_entails(solver, Not(safe(3, 1))))   # False (unknown)
print(z3_entails(solver, safe(2, 2)))        # False (unknown)

Creaking at \((2,1)\) means damaged floor at \((1,1)\), \((3,1)\), or \((2,2)\). Since \((1,1)\) is known safe, the damage is at \((3,1)\) or \((2,2)\)—but the solver cannot yet determine which. Both remain unknown.

Step 3: Visit \((1,2)\), perceiving rumbling but no creaking.

tell_percepts(solver, Percept(creaking=False, rumbling=True,
                              beacon=False, bump=False, beep=False), 1, 2)

print(z3_entails(solver, safe(2, 2)))        # True!
print(z3_entails(solver, Not(safe(3, 1))))   # True!
print(z3_entails(solver, Not(safe(1, 3))))   # True!

No creaking at \((1,2)\) rules out damaged floor at \((2,2)\). Combined with the earlier creaking at \((2,1)\), the solver now deduces that \((3,1)\) must have damaged floor. Rumbling at \((1,2)\) combined with no rumbling at \((2,1)\) identifies the forklift at \((1,3)\). The chain of inference unfolds automatically—the same reasoning we did by hand in Section 3.2: The Hazardous Warehouse Environment, but performed mechanically by the solver.

3.6.7 The Agent Loop

With the solver machinery in place, the agent operates in a loop:

TELL the solver about percepts at the current location.
ASK the solver to classify every unknown square as safe, dangerous, or still unknown. Update the known_safe and known_dangerous sets.
Choose an action:
- If the beacon is detected, GRAB the package.
- If carrying the package, plan a path through safe squares to \((1,1)\) and EXIT.
- Otherwise, plan a path to the nearest safe unvisited square and move there.
- If no safe unvisited square is reachable, return to \((1,1)\) and exit.
Execute the action, updating position and direction.
Repeat until the episode ends.

3.6.7.1 Path Planning

The agent plans paths using breadth-first search (BFS) through the known_safe set. This guarantees the shortest path through squares the agent has proven safe:

from collections import deque

def plan_path(start, goal_set, known_safe, width, height):
    """BFS from start to any cell in goal_set, moving only through known_safe."""
    queue = deque([(start, [start])])
    seen = {start}
    while queue:
        (cx, cy), path = queue.popleft()
        if (cx, cy) in goal_set:
            return path
        for nx, ny in get_adjacent(cx, cy, width, height):
            if (nx, ny) not in seen and (nx, ny) in known_safe:
                seen.add((nx, ny))
                queue.append(((nx, ny), path + [(nx, ny)]))
    return None  # No path found

3.6.7.2 Converting Paths to Actions

The environment accepts actions like FORWARD, TURN_LEFT, and TURN_RIGHT. To follow a path, the agent must convert each step into a sequence of turns (to face the right direction) followed by a forward move.

For efficient turning, we compute whether turning left or right is shorter:

def turns_between(current, target):
    """Return the shortest sequence of turn actions from current to target direction."""
    if current == target:
        return []
    # Count steps in each direction and choose the shorter one.
    ...

The full implementation handles this correctly by indexing into the ordered list of directions (NORTH, EAST, SOUTH, WEST) and comparing clockwise vs. counter-clockwise distances.

3.6.8 Complete Implementation

The following module contains the complete knowledge-based agent. It brings together all the pieces: Z3 entailment checking, variable helpers, physics encoding, percept telling, safety querying, path planning, and the decision loop.

Source code for file warehouse_kb_agent.py

"""
Knowledge-Based Agent for the Hazardous Warehouse (Propositional Z3)

Uses Z3's SMT solver with grounded propositional variables to reason
about safety and navigate the warehouse to retrieve the package.

This agent implements the TELL/ASK loop:
  1. TELL the solver about percepts (solver.add)
  2. ASK via entailment check (push/Not(query)/check/pop)
  3. Plan a path through safe squares toward the goal
  4. Execute actions and repeat

The knowledge base encodes the physics of the warehouse using one Bool
variable per square per predicate:
  - Creaking at (x,y) iff damaged floor in an adjacent square
  - Rumbling at (x,y) iff forklift in an adjacent square
  - A square is safe iff it has no damaged floor and no forklift
"""

from collections import deque

from z3 import Bool, Or, And, Not, Solver, unsat
from hazardous_warehouse_env import (
    HazardousWarehouseEnv,
    Action,
    Direction,
)


# ---------------------------------------------------------------------------
# Z3 Entailment Check
# ---------------------------------------------------------------------------

def z3_entails(solver, query):
    """Check whether the solver's current assertions entail *query*.

    Uses the refutation method: push a checkpoint, assert Not(query),
    and check satisfiability.  If unsat, the negated query is
    inconsistent with the KB --- meaning the KB entails the query.
    Pop restores the solver to its previous state.
    """
    solver.push()
    solver.add(Not(query))
    result = solver.check() == unsat
    solver.pop()
    return result


# ---------------------------------------------------------------------------
# Propositional Variable Helpers
# ---------------------------------------------------------------------------

def damaged(x, y):
    """Z3 Bool variable: damaged floor at (x, y)."""
    return Bool(f'D_{x}_{y}')


def forklift_at(x, y):
    """Z3 Bool variable: forklift at (x, y)."""
    return Bool(f'F_{x}_{y}')


def creaking_at(x, y):
    """Z3 Bool variable: creaking perceived at (x, y)."""
    return Bool(f'C_{x}_{y}')


def rumbling_at(x, y):
    """Z3 Bool variable: rumbling perceived at (x, y)."""
    return Bool(f'R_{x}_{y}')


def safe(x, y):
    """Z3 Bool variable: square (x, y) is safe to enter."""
    return Bool(f'OK_{x}_{y}')


# ---------------------------------------------------------------------------
# Adjacency
# ---------------------------------------------------------------------------

def get_adjacent(x, y, width=4, height=4):
    """Return the list of (x, y) positions adjacent to (x, y)."""
    result = []
    for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
        nx, ny = x + dx, y + dy
        if 1 <= nx <= width and 1 <= ny <= height:
            result.append((nx, ny))
    return result


# ---------------------------------------------------------------------------
# Knowledge-Base Construction
# ---------------------------------------------------------------------------

def build_warehouse_kb(width=4, height=4):
    """Build a Z3 Solver populated with the physics of the warehouse.

    The solver contains three kinds of constraints for every square (x, y):

    1. Creaking biconditional
       C_x_y == Or(D_a1_b1, D_a2_b2, ...)
       where (a_i, b_i) are the squares adjacent to (x, y).

    2. Rumbling biconditional
       R_x_y == Or(F_a1_b1, F_a2_b2, ...)

    3. Safety biconditional
       OK_x_y == And(Not(D_x_y), Not(F_x_y))

    Z3's native == operator handles biconditionals directly ---
    no manual CNF conversion is needed.

    It also encodes the initial knowledge that the starting square (1, 1)
    has no damaged floor and no forklift.
    """
    solver = Solver()

    # The starting square is safe.
    solver.add(Not(damaged(1, 1)))
    solver.add(Not(forklift_at(1, 1)))

    for x in range(1, width + 1):
        for y in range(1, height + 1):
            adj = get_adjacent(x, y, width, height)

            # --- Creaking rule ---
            solver.add(creaking_at(x, y) == Or([damaged(a, b) for a, b in adj]))

            # --- Rumbling rule ---
            solver.add(rumbling_at(x, y) == Or([forklift_at(a, b) for a, b in adj]))

            # --- Safety rule ---
            solver.add(
                safe(x, y) == And(Not(damaged(x, y)), Not(forklift_at(x, y)))
            )

    return solver


# ---------------------------------------------------------------------------
# Turning Helpers
# ---------------------------------------------------------------------------

_DIRECTION_ORDER = [Direction.NORTH, Direction.EAST, Direction.SOUTH, Direction.WEST]


def _direction_index(d):
    return _DIRECTION_ORDER.index(d)


def turns_between(current, target):
    """Return a list of TURN_LEFT / TURN_RIGHT actions to face *target*.

    Chooses the shortest rotation direction.
    """
    if current == target:
        return []
    ci = _direction_index(current)
    ti = _direction_index(target)
    right_steps = (ti - ci) % 4   # clockwise
    left_steps = (ci - ti) % 4    # counter-clockwise
    if right_steps <= left_steps:
        return [Action.TURN_RIGHT] * right_steps
    else:
        return [Action.TURN_LEFT] * left_steps


def delta_to_direction(dx, dy):
    """Map a movement delta to the Direction enum."""
    return {
        (0, 1): Direction.NORTH,
        (0, -1): Direction.SOUTH,
        (1, 0): Direction.EAST,
        (-1, 0): Direction.WEST,
    }[(dx, dy)]


# ---------------------------------------------------------------------------
# Knowledge-Based Agent
# ---------------------------------------------------------------------------

class WarehouseKBAgent:
    """A knowledge-based agent for the Hazardous Warehouse.

    The agent maintains:
      - A Z3 Solver with physics rules and accumulated percepts
      - Sets of known-safe and known-dangerous squares
      - A queue of planned actions
      - Its own position, direction, and inventory state

    Decision strategy (in priority order):
      1. If the beacon is detected, GRAB the package.
      2. If carrying the package, navigate to (1,1) and EXIT.
      3. Otherwise, explore the nearest safe unvisited square.
      4. If no safe unvisited square is reachable, return to (1,1) and EXIT.
    """

    def __init__(self, env):
        self.env = env
        self.solver = build_warehouse_kb(env.width, env.height)
        self.x = 1
        self.y = 1
        self.direction = Direction.EAST
        self.has_package = False
        self.visited = {(1, 1)}
        self.known_safe = {(1, 1)}
        self.known_dangerous = set()
        self.action_queue = []
        self.step_count = 0

    # ----- Percepts ----------------------------------------------------------

    def tell_percepts(self, percept):
        """Translate a Percept into Z3 assertions and TELL the solver."""
        x, y = self.x, self.y
        if percept.creaking:
            self.solver.add(creaking_at(x, y))
        else:
            self.solver.add(Not(creaking_at(x, y)))
        if percept.rumbling:
            self.solver.add(rumbling_at(x, y))
        else:
            self.solver.add(Not(rumbling_at(x, y)))

    # ----- Safety queries ----------------------------------------------------

    def update_safety(self):
        """ASK the solver about every square whose status is still unknown."""
        for x in range(1, self.env.width + 1):
            for y in range(1, self.env.height + 1):
                pos = (x, y)
                if pos in self.known_safe or pos in self.known_dangerous:
                    continue
                if z3_entails(self.solver, safe(x, y)):
                    self.known_safe.add(pos)
                elif z3_entails(self.solver, Not(safe(x, y))):
                    self.known_dangerous.add(pos)

    # ----- Path planning -----------------------------------------------------

    def plan_path(self, start, goal_set):
        """BFS through known-safe squares from *start* to any cell in *goal_set*.

        Returns a list of (x, y) positions forming the path (including
        *start* and the reached goal), or None if no path exists.
        """
        queue = deque([(start, [start])])
        seen = {start}
        while queue:
            (cx, cy), path = queue.popleft()
            if (cx, cy) in goal_set:
                return path
            for nx, ny in get_adjacent(cx, cy, self.env.width, self.env.height):
                if (nx, ny) not in seen and (nx, ny) in self.known_safe:
                    seen.add((nx, ny))
                    queue.append(((nx, ny), path + [(nx, ny)]))
        return None

    def path_to_actions(self, path):
        """Convert a position path into a sequence of Actions.

        Returns (actions, final_direction) where *actions* is the list of
        TURN_LEFT / TURN_RIGHT / FORWARD actions and *final_direction* is
        the direction the robot faces after executing them all.
        """
        actions = []
        direction = self.direction
        for i in range(1, len(path)):
            dx = path[i][0] - path[i - 1][0]
            dy = path[i][1] - path[i - 1][1]
            target_dir = delta_to_direction(dx, dy)
            actions.extend(turns_between(direction, target_dir))
            actions.append(Action.FORWARD)
            direction = target_dir
        return actions, direction

    # ----- Decision logic ----------------------------------------------------

    def choose_action(self, percept):
        """Select the next action based on the current state of knowledge."""
        # Execute queued actions first (from a multi-step plan).
        if self.action_queue:
            return self.action_queue.pop(0)

        # 1. If the beacon is on, grab the package.
        if percept.beacon and not self.has_package:
            return Action.GRAB

        # 2. If carrying the package, navigate home and exit.
        if self.has_package:
            if (self.x, self.y) == (1, 1):
                return Action.EXIT
            path = self.plan_path((self.x, self.y), {(1, 1)})
            if path and len(path) > 1:
                actions, _ = self.path_to_actions(path)
                self.action_queue = actions[1:]
                return actions[0]
            # Already at (1,1) or can't find path — just exit.
            return Action.EXIT

        # 3. Explore the nearest safe unvisited square.
        safe_unvisited = self.known_safe - self.visited
        if safe_unvisited:
            path = self.plan_path((self.x, self.y), safe_unvisited)
            if path and len(path) > 1:
                actions, _ = self.path_to_actions(path)
                self.action_queue = actions[1:]
                return actions[0]

        # 4. Nothing left to explore — go home and exit.
        if (self.x, self.y) == (1, 1):
            return Action.EXIT
        path = self.plan_path((self.x, self.y), {(1, 1)})
        if path and len(path) > 1:
            actions, _ = self.path_to_actions(path)
            self.action_queue = actions[1:]
            self.action_queue.append(Action.EXIT)
            return actions[0]
        return Action.EXIT

    # ----- Execution ---------------------------------------------------------

    def execute_action(self, action):
        """Send *action* to the environment and update internal bookkeeping."""
        percept, reward, done, info = self.env.step(action)

        if action == Action.FORWARD and not percept.bump:
            dx, dy = self.direction.delta()
            self.x += dx
            self.y += dy
            self.visited.add((self.x, self.y))
        elif action == Action.TURN_LEFT:
            self.direction = self.direction.turn_left()
        elif action == Action.TURN_RIGHT:
            self.direction = self.direction.turn_right()
        elif action == Action.GRAB and info.get("grabbed"):
            self.has_package = True

        self.step_count += 1
        return percept, reward, done, info

    # ----- Main loop ---------------------------------------------------------

    def run(self, verbose=True):
        """Run the full perceive-tell-ask-act loop until the episode ends."""
        # Process the initial percept at (1, 1).
        percept = self.env._last_percept
        self.tell_percepts(percept)
        self.update_safety()

        if verbose:
            print(f"Start at ({self.x},{self.y}) facing {self.direction.name}")
            print(f"  Percept: {percept}")
            print(f"  Known safe: {sorted(self.known_safe)}")

        while True:
            action = self.choose_action(percept)
            percept, reward, done, info = self.execute_action(action)

            if verbose:
                print(f"\nStep {self.step_count}: {action.name}")
                print(f"  Position: ({self.x},{self.y}), Facing: {self.direction.name}")
                print(f"  Percept: {percept}")
                print(f"  Info: {info}")

            if done:
                if verbose:
                    print(f"\n{'=' * 40}")
                    print(f"Episode ended.  Reward: {self.env.total_reward:.0f}")
                    print(f"Steps taken: {self.step_count}")
                    success = info.get("exit") == "success"
                    print(f"Success: {success}")
                return

            # After moving to a new square, tell percepts and re-query safety.
            if action == Action.FORWARD and not percept.bump:
                self.tell_percepts(percept)
                self.update_safety()
                if verbose:
                    print(f"  Known safe: {sorted(self.known_safe)}")
                    print(f"  Known dangerous: {sorted(self.known_dangerous)}")


# ---------------------------------------------------------------------------
# Main — run on the example layout from the textbook
# ---------------------------------------------------------------------------

if __name__ == "__main__":
    from hazardous_warehouse_viz import configure_rn_example_layout

    env = HazardousWarehouseEnv(seed=0)
    configure_rn_example_layout(env)

    print("True state (hidden from the agent):")
    print(env.render(reveal=True))
    print()

    agent = WarehouseKBAgent(env)
    agent.run(verbose=True)

3.6.9 Running the Agent

To run the agent on the example layout from Section 3.2: The Hazardous Warehouse Environment:

from hazardous_warehouse_env import HazardousWarehouseEnv
from hazardous_warehouse_viz import configure_rn_example_layout
from warehouse_kb_agent import WarehouseKBAgent

env = HazardousWarehouseEnv(seed=0)
configure_rn_example_layout(env)

print("True state (hidden from agent):")
print(env.render(reveal=True))

agent = WarehouseKBAgent(env)
agent.run(verbose=True)

The agent prints a step-by-step trace. At each step it reports its position, the percepts received, the action taken, and the updated sets of known-safe and known-dangerous squares.

On the example layout (damaged floor at \((3,1)\) and \((3,3)\), forklift at \((1,3)\), package at \((2,3)\)), the agent:

Starts at \((1,1)\) and deduces that \((2,1)\) and \((1,2)\) are safe.
Explores \((2,1)\) (creaking) and \((1,2)\) (rumbling), deducing that \((3,1)\) is damaged, \((1,3)\) has the forklift, and \((2,2)\) is safe.
Explores \((2,2)\) and then \((2,3)\), where it detects the beacon and grabs the package.
Plans a return path through safe squares to \((1,1)\) and exits.

The entire process is driven by Z3's satisfiability checking on the propositional KB—the same logical foundations developed in Section 3.3: Propositional Logic and Section 3.3.18: Resolution and Completeness, applied mechanically.

3.6.10 Discussion

3.6.10.1 What the Agent Handles Well

The agent's reasoning is sound: it will never enter a square it has not proven safe. Z3 is both sound and complete for propositional satisfiability, so the refutation-based entailment check is guaranteed to find any valid conclusion. If the solver entails that a square is safe, the square truly is safe (assuming the percept rules correctly model the world).

The agent is also systematic: it explores all reachable safe squares in BFS order, ensuring it covers as much of the warehouse as the available evidence allows.

3.6.10.2 Limitations

Conservative behavior. The agent only enters squares it can prove safe. If the solver lacks enough evidence to determine a square's status, the agent avoids it. In some configurations, the package may be reachable only through squares whose safety cannot be proven from the available percepts. In these cases the agent gives up and exits without the package.

A human might take a calculated risk ("this square is probably safe"), but our agent's logic is strictly two-valued: safe or not provably safe. Handling probability requires probabilistic reasoning, which is beyond the scope of this chapter.

No shutdown device reasoning. The current agent does not use the emergency shutdown device. Adding this would require encoding additional rules about the device's line-of-sight effect and reasoning about when using it is advantageous—for example, when the forklift location is known.

Propositional grounding. We used propositional logic, which requires separate symbols for every square. For a \(100 \times 100\) warehouse this would mean thousands of symbols and tens of thousands of biconditionals. A first-order logic encoding (Section 3.4: First-Order Logic) would express the same rules with just three quantified sentences, regardless of grid size. In Section 3.7: Building a FOL Agent with Z3 we build exactly that: a FOL agent that expresses the physics rules as quantified sentences.

3.6.10.3 Connection to Theory

Every component of the agent maps directly to a concept from the preceding sections:

Mapping from agent code to textbook concepts
Agent component	Section	Concept
`Solver`	Section 3.3: Propositional Logic	Propositional knowledge base
`solver.add()`	Section 3.1: Knowledge-Based Agents and the Limits of Search	TELL operation
`z3_entails()`	Section 3.1: Knowledge-Based Agents and the Limits of Search	ASK operation
Physics encoding	Section 3.3.10: A Propositional KB for the Hazardous Warehouse	Biconditional rules
Satisfiability check	Section 3.3.18: Resolution and Completeness	Refutation-based entailment
`==` (biconditional)	Section 3.3.19: Conjunctive Normal Form	Handled internally by Z3

The agent is a concrete realization of the knowledge-based agent architecture from Section 3.1.3: Knowledge-Based Agents: it maintains an explicit knowledge base, uses logical inference to derive new facts, and selects actions based on what it can prove.

3.6.11 Summary

Building the knowledge-based agent required:

Z3 as reasoning engine: a Solver that handles biconditionals natively and checks entailment via push/pop.
Physics encoding: biconditional rules for creaking, rumbling, and safety, generated programmatically for every square using ==.
Percept translation: converting environment observations into solver assertions.
Safety queries: asking the solver whether each square is provably safe or provably dangerous via z3_entails.
Path planning: BFS through the safe frontier to reach exploration targets or the exit.
An action loop: the perceive-tell-ask-act cycle that drives the agent forward.

The result is an agent that navigates the Hazardous Warehouse safely and retrieves the package whenever the available evidence permits—all through the mechanical application of propositional logic and Z3's satisfiability checking.

Bibliography

[Z3] De Moura, Leonardo and Nikolaj Bjørner. "Z3: An Efficient SMT Solver". 4963 (2008): 337-340. http://link.springer.com/10.1007/978-3-540-78800-3_24

Drawing Tools

Table of Contents