3.2 The Hazardous Warehouse Environment

To ground our study of knowledge-based reasoning, we introduce a challenging variant of the warehouse environment: the Hazardous Warehouse. This environment requires the robot to reason carefully about partial information—moving blindly leads to disaster.

The Hazardous Warehouse parallels a classic AI benchmark called the Wumpus World (Russell and Norvig, 2020, ch. 7), adapted to our engineering context.

3.2.1 Environment Description

The Hazardous Warehouse is a \(4 \times 4\) grid representing a damaged section of a larger facility. The robot must retrieve a high-value package and return to the exit.

3.2.1.1 The Grid

Squares are labeled by coordinates \((x, y)\) where \(x\) is the column (1–4, left to right) and \(y\) is the row (1–4, bottom to top). The robot starts at \((1, 1)\), which is also the only exit.

Figure 3.1: Hazardous Warehouse grid layout. The robot (R) starts at (1,1). Damaged floor sections (D), the malfunctioning forklift (F), and the package (P) are hidden until discovered.

3.2.1.2 Hazards

Definition 3.5 Damaged Floor

Some squares contain damaged floor sections—structural failures that will cause the robot to fall into the basement level below. Entering a damaged floor square destroys the robot. Damaged floor locations are fixed but unknown to the robot.

Definition 3.6 Malfunctioning Forklift

One square contains a malfunctioning forklift with a jammed accelerator that spins in place. The forklift doesn't exit its square but will collide with any robot that enters its square, destroying the robot. The forklift's location is fixed but unknown.

3.2.1.3 The Goal

Definition 3.7 High-Value Package

One square contains the high-value package the robot must retrieve. The package location is fixed but unknown. Successfully grabbing the package and returning to \((1,1)\) completes the mission.

3.2.1.4 The Shutdown Device

Definition 3.8 Emergency Shutdown Device

The robot carries an emergency shutdown device—a one-time-use remote trigger that sends a kill signal to the malfunctioning forklift. When activated, the shutdown signal propagates in the direction the robot is facing. If the forklift is anywhere along that line, it is disabled and the square becomes safe. The robot hears a confirmation beep if the shutdown succeeds.

The shutdown device can only be used once. Using it in the wrong direction wastes it.

3.2.2 Percepts

The robot cannot see into adjacent squares. Instead, it perceives local evidence about nearby hazards:

Percepts available to the robot
Percept	Meaning
Creaking	The robot is adjacent to (not in) a damaged floor section
Rumbling	The robot is adjacent to (not in) the malfunctioning forklift
Beacon	The robot is in the same square as the package
Bump	The robot walked into a wall (stayed in place)
Beep	The shutdown device successfully disabled the forklift

"Adjacent" means the four cardinal directions (north, south, east, west)—not diagonals.

Comment 3.2 Percept Limitations

Percepts indicate something is nearby but not where. If the robot perceives creaking, at least one adjacent square has damaged floor—but which one? This ambiguity is precisely what makes logical reasoning necessary.

3.2.3 Actions

The robot can perform six actions:

Actions available to the robot
Action	Effect
Forward	Move one square in the facing direction (unless blocked by wall)
TurnLeft	Rotate 90° counterclockwise
TurnRight	Rotate 90° clockwise
Grab	Pick up the package if present in the current square
Shutdown	Activate the emergency shutdown device in the facing direction
Exit	Leave the warehouse (only valid at \((1,1)\))

The robot always knows its own location and orientation (it has reliable odometry). What it doesn't know is the contents of squares it hasn't visited.

3.2.4 Environment Properties (PEAS)

Applying the PEAS framework from Chapter 1:

Performance Measure:

+1000 for retrieving the package and exiting
−1000 for being destroyed (damaged floor or forklift collision)
−1 for each action taken
−10 for using the shutdown device

Environment:

Discrete \(4 \times 4\) grid
Static (hazards don't move)
Single agent
Partially observable (only local percepts)
Deterministic (actions have predictable effects)
Sequential (current decisions affect future options)

Actuators: Movement, grabbing, shutdown device, exit

Sensors: Creaking, rumbling, beacon, bump, beep detectors

The critical property is partial observability. The robot must build a model of the world from limited evidence.

3.2.5 Why Search Alone Fails

Consider applying A* search to this problem. What is the state space?

If we knew the hazard locations, the state would be \((x, y, \mathit{orientation}, \mathit{has\_package}, \mathit{forklift\_alive}, \mathit{shutdown\_available})\). We could search for an optimal path.

But we don't know the hazard locations. The robot starts with uncertainty about \(4 \times 4 - 1 = 15\) squares. Each could contain damaged floor (or not), and one contains the forklift, one contains the package. The number of possible world configurations is enormous.

Worse, the robot must act before it has complete information. It must choose: explore square \((2,1)\) or \((1,2)\)? Either might be deadly. The robot needs to reason about what it knows to determine which moves are provably safe.

3.2.6 Example: Initial Reasoning

The robot starts at \((1,1)\), facing east. It perceives: no creaking, no rumbling, beacon absent.

What can it conclude?

No creaking means no damaged floor in \((2,1)\) or \((1,2)\) (the only adjacent squares from \((1,1)\))
No rumbling means the forklift is not in \((2,1)\) or \((1,2)\)
Therefore, both \((2,1)\) and \((1,2)\) are safe

The robot has used logical inference to expand its knowledge. Without moving, it has determined two squares are safe to enter.

Figure 3.2: Initial reasoning step 1: At (1,1), the robot perceives no creaking and no rumbling, allowing it to deduce that adjacent squares (2,1) and (1,2) are safe.

It moves to \((2,1)\) and perceives creaking but no rumbling. What can it infer?

Creaking means damaged floor in \((1,1)\), \((3,1)\), or \((2,2)\) (adjacent squares)
But the robot came from \((1,1)\) and it's safe (the starting square is always safe)
Therefore, damaged floor is in \((3,1)\) or \((2,2)\) (or both)

The robot cannot yet determine which square is dangerous. It must gather more evidence.

Figure 3.3: Initial reasoning step 2: At (2,1), the robot perceives creaking. Since (1,1) is known safe, the damaged floor must be at (3,1) or (2,2).

The robot turns around and returns to \((1,1)\), then turns north and moves to \((1,2)\). There, it hears a rumble but no creaking. What does that imply?

Rumbling means the forklift is adjacent to \((1,2)\): either \((1,3)\) or \((2,2)\)
If the forklift were at \((2,2)\), then \((2,1)\) would have rumbled earlier, but it did not
Therefore, the forklift must be at \((1,3)\)
No creaking at \((1,2)\) rules out damaged floor in \((2,2)\), so \((2,2)\) is safe
With \((2,2)\) safe, the earlier creaking at \((2,1)\) must be due to damaged floor at \((3,1)\)

Figure 3.5: Initial reasoning step 4: At (1,2), the robot hears rumbling but no creaking, allowing it to deduce the forklift is at (1,3), (2,2) is safe, and (3,1) is damaged.

Armed with this knowledge, the robot could disable the forklift by deploying the emergency shutdown device. Instead, the robot turns east and moves to \((2,2)\). It perceives nothing new, which is consistent with \((2,2)\) being safe.

Figure 3.6: Initial reasoning step 5: At (2,2), the robot confirms it can safely stand there while keeping the forklift and damaged floor locations in its knowledge base.

The robot turns north and moves to \((2,3)\). It hears both creaking and rumbling and detects the beacon: it has found the package.

Figure 3.7: Initial reasoning step 6: At (2,3), the robot detects the beacon and confirms the nearby hazards while locating the package.

3.2.7 The Knowledge Representation Challenge

To reason like this systematically, we need:

A language to express facts: "Square \((2,1)\) is safe," "If creaking at \(L\), then damaged floor adjacent to \(L\)"
A way to represent what the robot knows vs. what is actually true
Inference procedures to derive new facts from known facts

The next sections develop these tools:

Section 3.3: Propositional Logic: A simple language for facts and rules, sufficient for small instances
Section 3.4: First-Order Logic: A richer language with variables, enabling general rules like "for all locations \(L\)..."

3.2.8 Assumptions and Simplifications

To keep the focus on logical reasoning, we make several simplifying assumptions:

Static environment: Hazards don't move or change
Perfect actuation: Actions always succeed (no motor failures)
Perfect sensing: Percepts are never wrong (no sensor noise)
Known self-location: The robot always knows where it is

Relaxing these assumptions leads to probabilistic reasoning (to which we will return) and decision-making under uncertainty. For now, the challenge is purely logical: given perfect but incomplete information, what can we deduce?

3.2.9 Summary

The Hazardous Warehouse presents a partially observable environment where:

The robot has local percepts (creaking, rumbling, beacon, bump, beep)
Hazards (damaged floor, malfunctioning forklift) are deadly but detectable indirectly
The goal is to retrieve a package and exit safely
Success requires reasoning about what percepts imply about the world

This environment motivates the development of formal logic in the following sections. We will build a knowledge base that represents what the robot knows and use inference to determine safe actions.

3.2.10 Implementation

The following Python module provides a complete implementation of the Hazardous Warehouse environment, suitable for testing knowledge-based agents.

Source code for file hazardous_warehouse_env.py

"""
Hazardous Warehouse Environment

A partially observable environment for knowledge-based reasoning agents.
The robot must navigate a grid with hidden hazards (damaged floor, malfunctioning
forklift), retrieve a package, and exit safely.
"""

from dataclasses import dataclass, field
from enum import Enum, auto
from typing import NamedTuple
import random


# -----------------------------------------------------------------------------
# Types and Constants
# -----------------------------------------------------------------------------

class Direction(Enum):
    """Cardinal directions the robot can face."""
    NORTH = auto()
    EAST = auto()
    SOUTH = auto()
    WEST = auto()

    def turn_left(self) -> "Direction":
        """Rotate 90° counterclockwise."""
        order = [Direction.NORTH, Direction.WEST, Direction.SOUTH, Direction.EAST]
        return order[(order.index(self) + 1) % 4]

    def turn_right(self) -> "Direction":
        """Rotate 90° clockwise."""
        order = [Direction.NORTH, Direction.EAST, Direction.SOUTH, Direction.WEST]
        return order[(order.index(self) + 1) % 4]

    def delta(self) -> tuple[int, int]:
        """Return (dx, dy) for moving in this direction."""
        deltas = {
            Direction.NORTH: (0, 1),
            Direction.EAST: (1, 0),
            Direction.SOUTH: (0, -1),
            Direction.WEST: (-1, 0),
        }
        return deltas[self]


class Action(Enum):
    """Actions available to the robot."""
    FORWARD = auto()
    TURN_LEFT = auto()
    TURN_RIGHT = auto()
    GRAB = auto()
    SHUTDOWN = auto()
    EXIT = auto()


class Percept(NamedTuple):
    """Sensory information available to the robot at each step."""
    creaking: bool      # Adjacent to damaged floor
    rumbling: bool      # Adjacent to malfunctioning forklift
    beacon: bool        # At package location
    bump: bool          # Hit a wall on last move
    beep: bool          # Shutdown device successfully disabled forklift


# -----------------------------------------------------------------------------
# Robot and Environment State
# -----------------------------------------------------------------------------

@dataclass
class RobotState:
    """Internal state of the robot."""
    x: int
    y: int
    direction: Direction
    has_package: bool = False
    has_shutdown_device: bool = True
    alive: bool = True


@dataclass
class HazardousWarehouseEnv:
    """
    Hazardous Warehouse environment for knowledge-based agents.

    A 4x4 grid (default) where:
    - The robot starts at (1, 1) facing east
    - Some squares have damaged floor (deadly, cause creaking in adjacent squares)
    - One square has a malfunctioning forklift (deadly, causes rumbling in adjacent)
    - One square has the package (emits beacon signal when robot is there)
    - The robot has a one-use shutdown device to disable the forklift

    Coordinates: (x, y) where x is column (1-4), y is row (1-4), origin bottom-left.
    """

    width: int = 4
    height: int = 4
    num_damaged: int = 2
    seed: int | None = None

    # Hidden world state (not visible to agent)
    _damaged: set[tuple[int, int]] = field(default_factory=set)
    _forklift: tuple[int, int] | None = None
    _forklift_alive: bool = True
    _package: tuple[int, int] | None = None

    # Robot state
    _robot: RobotState = field(default_factory=lambda: RobotState(1, 1, Direction.EAST))

    # Episode tracking
    _steps: int = 0
    _total_reward: float = 0.0
    _last_percept: Percept = field(default_factory=lambda: Percept(False, False, False, False, False))
    _terminated: bool = False
    _success: bool = False

    # History for replay
    _history: list[dict] = field(default_factory=list)

    def __post_init__(self):
        self.reset()

    def reset(self, seed: int | None = None) -> Percept:
        """Reset the environment to initial state with new random hazard placement."""
        if seed is not None:
            self.seed = seed
        if self.seed is not None:
            random.seed(self.seed)

        # Generate valid positions (exclude starting square)
        all_positions = [
            (x, y)
            for x in range(1, self.width + 1)
            for y in range(1, self.height + 1)
            if (x, y) != (1, 1)
        ]
        random.shuffle(all_positions)

        # Place hazards
        self._damaged = set(all_positions[:self.num_damaged])
        remaining = all_positions[self.num_damaged:]
        self._forklift = remaining[0]
        self._package = remaining[1]
        self._forklift_alive = True

        # Reset robot
        self._robot = RobotState(1, 1, Direction.EAST)

        # Reset tracking
        self._steps = 0
        self._total_reward = 0.0
        self._terminated = False
        self._success = False
        self._history = []

        # Get initial percept
        self._last_percept = self._get_percept(bump=False, beep=False)
        self._record_state()
        return self._last_percept

    def step(self, action: Action) -> tuple[Percept, float, bool, dict]:
        """
        Execute an action and return (percept, reward, terminated, info).

        Rewards:
        - +1000 for successful exit with package
        - -1000 for death (damaged floor or forklift collision)
        - -1 for each action
        - -10 for using shutdown device
        """
        if self._terminated:
            return self._last_percept, 0.0, True, {"error": "Episode already terminated"}

        reward = -1.0  # Base action cost
        bump = False
        beep = False
        info: dict = {"action": action.name}

        if action == Action.FORWARD:
            bump = self._move_forward()
            if not bump and self._robot.alive:
                # Check for death
                pos = (self._robot.x, self._robot.y)
                if pos in self._damaged:
                    self._robot.alive = False
                    reward = -1000.0
                    self._terminated = True
                    info["death"] = "damaged_floor"
                elif pos == self._forklift and self._forklift_alive:
                    self._robot.alive = False
                    reward = -1000.0
                    self._terminated = True
                    info["death"] = "forklift"

        elif action == Action.TURN_LEFT:
            self._robot.direction = self._robot.direction.turn_left()

        elif action == Action.TURN_RIGHT:
            self._robot.direction = self._robot.direction.turn_right()

        elif action == Action.GRAB:
            pos = (self._robot.x, self._robot.y)
            if pos == self._package and not self._robot.has_package:
                self._robot.has_package = True
                info["grabbed"] = True
            else:
                info["grabbed"] = False

        elif action == Action.SHUTDOWN:
            if self._robot.has_shutdown_device:
                self._robot.has_shutdown_device = False
                reward -= 9.0  # Additional -10 total for shutdown
                beep = self._fire_shutdown()
                info["shutdown_success"] = beep
            else:
                info["shutdown_success"] = False
                info["error"] = "No shutdown device"

        elif action == Action.EXIT:
            pos = (self._robot.x, self._robot.y)
            if pos == (1, 1):
                self._terminated = True
                if self._robot.has_package:
                    reward = 1000.0
                    self._success = True
                    info["exit"] = "success"
                else:
                    info["exit"] = "no_package"
            else:
                info["exit"] = "wrong_location"

        self._steps += 1
        self._total_reward += reward

        # Get new percept
        if self._robot.alive:
            self._last_percept = self._get_percept(bump=bump, beep=beep)
        else:
            self._last_percept = Percept(False, False, False, bump, beep)

        self._record_state(action)
        return self._last_percept, reward, self._terminated, info

    def _move_forward(self) -> bool:
        """Attempt to move forward. Returns True if bumped into wall."""
        dx, dy = self._robot.direction.delta()
        new_x = self._robot.x + dx
        new_y = self._robot.y + dy

        # Check bounds
        if new_x < 1 or new_x > self.width or new_y < 1 or new_y > self.height:
            return True  # Bump

        self._robot.x = new_x
        self._robot.y = new_y
        return False

    def _fire_shutdown(self) -> bool:
        """Fire shutdown device in facing direction. Returns True if forklift hit."""
        if not self._forklift_alive or self._forklift is None:
            return False

        dx, dy = self._robot.direction.delta()
        x, y = self._robot.x, self._robot.y

        # Trace line in facing direction
        while True:
            x += dx
            y += dy
            if x < 1 or x > self.width or y < 1 or y > self.height:
                break
            if (x, y) == self._forklift:
                self._forklift_alive = False
                return True

        return False

    def _get_percept(self, bump: bool, beep: bool) -> Percept:
        """Generate percept for current position."""
        pos = (self._robot.x, self._robot.y)
        adjacent = self._get_adjacent(pos)

        creaking = any(adj in self._damaged for adj in adjacent)
        rumbling = self._forklift_alive and self._forklift in adjacent
        beacon = pos == self._package and not self._robot.has_package

        return Percept(
            creaking=creaking,
            rumbling=rumbling,
            beacon=beacon,
            bump=bump,
            beep=beep,
        )

    def _get_adjacent(self, pos: tuple[int, int]) -> list[tuple[int, int]]:
        """Return list of adjacent positions (cardinal directions only)."""
        x, y = pos
        candidates = [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]
        return [
            (ax, ay) for ax, ay in candidates
            if 1 <= ax <= self.width and 1 <= ay <= self.height
        ]

    def _record_state(self, action: Action | None = None) -> None:
        """Record current state for replay/visualization."""
        self._history.append({
            "step": self._steps,
            "action": action.name if action else None,
            "robot_x": self._robot.x,
            "robot_y": self._robot.y,
            "direction": self._robot.direction.name,
            "has_package": self._robot.has_package,
            "has_shutdown": self._robot.has_shutdown_device,
            "alive": self._robot.alive,
            "forklift_alive": self._forklift_alive,
            "percept": self._last_percept._asdict(),
            "total_reward": self._total_reward,
        })

    # -------------------------------------------------------------------------
    # Public query methods for agents
    # -------------------------------------------------------------------------

    @property
    def robot_position(self) -> tuple[int, int]:
        """Current robot position (x, y)."""
        return (self._robot.x, self._robot.y)

    @property
    def robot_direction(self) -> Direction:
        """Current robot facing direction."""
        return self._robot.direction

    @property
    def has_package(self) -> bool:
        """Whether robot is carrying the package."""
        return self._robot.has_package

    @property
    def has_shutdown_device(self) -> bool:
        """Whether robot still has the shutdown device."""
        return self._robot.has_shutdown_device

    @property
    def is_alive(self) -> bool:
        """Whether robot is still operational."""
        return self._robot.alive

    @property
    def steps(self) -> int:
        """Number of steps taken."""
        return self._steps

    @property
    def total_reward(self) -> float:
        """Cumulative reward."""
        return self._total_reward

    @property
    def history(self) -> list[dict]:
        """Episode history for replay."""
        return self._history.copy()

    # -------------------------------------------------------------------------
    # Methods for visualization (reveal hidden state)
    # -------------------------------------------------------------------------

    def get_true_state(self) -> dict:
        """Return complete world state (for visualization/debugging only)."""
        return {
            "width": self.width,
            "height": self.height,
            "damaged": list(self._damaged),
            "forklift": self._forklift,
            "forklift_alive": self._forklift_alive,
            "package": self._package,
            "robot": {
                "x": self._robot.x,
                "y": self._robot.y,
                "direction": self._robot.direction.name,
                "has_package": self._robot.has_package,
                "has_shutdown": self._robot.has_shutdown_device,
                "alive": self._robot.alive,
            },
            "terminated": self._terminated,
            "success": self._success,
        }

    def render(self, reveal: bool = False) -> str:
        """
        Render the grid as ASCII.

        If reveal=False (default), only shows what robot has visited.
        If reveal=True, shows complete world state.
        """
        lines = []
        # Header with column numbers
        lines.append("  " + " ".join(str(x) for x in range(1, self.width + 1)))

        for y in range(self.height, 0, -1):
            row = [str(y)]
            for x in range(1, self.width + 1):
                pos = (x, y)
                if pos == (self._robot.x, self._robot.y):
                    if not self._robot.alive:
                        row.append("X")
                    elif self._robot.has_package:
                        row.append("@")  # Robot with package
                    else:
                        # Show direction
                        arrows = {
                            Direction.NORTH: "^",
                            Direction.EAST: ">",
                            Direction.SOUTH: "v",
                            Direction.WEST: "<",
                        }
                        row.append(arrows[self._robot.direction])
                elif reveal:
                    if pos in self._damaged:
                        row.append("D")
                    elif pos == self._forklift:
                        row.append("F" if self._forklift_alive else "f")
                    elif pos == self._package and not self._robot.has_package:
                        row.append("P")
                    else:
                        row.append(".")
                else:
                    row.append("?")
            lines.append(" ".join(row))

        return "\n".join(lines)


# -----------------------------------------------------------------------------
# Example Usage
# -----------------------------------------------------------------------------

if __name__ == "__main__":
    # Create environment with fixed seed for reproducibility
    env = HazardousWarehouseEnv(seed=42)

    print("=== Hazardous Warehouse Environment ===")
    print("\nTrue state (hidden from agent):")
    print(env.render(reveal=True))

    print("\nAgent's view:")
    print(env.render(reveal=False))

    print(f"\nInitial percept: {env._last_percept}")
    print(f"Robot at: {env.robot_position}, facing {env.robot_direction.name}")

    # Take a few actions
    actions = [Action.FORWARD, Action.TURN_LEFT, Action.FORWARD]
    for action in actions:
        percept, reward, done, info = env.step(action)
        print(f"\nAction: {action.name}")
        print(f"Percept: {percept}")
        print(f"Reward: {reward}, Done: {done}")
        print(f"Position: {env.robot_position}, Facing: {env.robot_direction.name}")

    print("\n=== Final State ===")
    print(env.render(reveal=True))

A visualization module provides grid rendering and episode replay animations. It can be downloaded with the following link:

Bibliography

[AI] Russell, Stuart J. and Peter Norvig. Artificial intelligence: a modern approach. (2020) 4 ed. Prentice Hall. http://aima.cs.berkeley.edu/

Drawing Tools

Table of Contents