1.7 Problems

Problem 1.1 Simple Reflex Agent

Objective: Implement a stateless reflex agent and compare its performance to the greedy goal-based agent from the concluding exercise.

Background: The greedy agent from Section 1.6: Concluding Design Exercise: Robotic Warehouse Agent maintains internal state (recent positions for loop detection). A simple reflex agent has no memory—it chooses actions based only on the current percept, using condition-action rules.

Task:

Implement a simple reflex agent (warehouse_agent_reflex.py) with the following rules:
- If at pickup location and no item: pick
- If at dropoff location and carrying item: drop
- If carrying item and dropoff is North: move North
- If carrying item and dropoff is South: move South
- (Continue for all 8 combinations of carry-state × direction)
- If no rule applies: choose a random valid action
Run 50 episodes for both the reflex agent and the greedy agent (Section 1.6: Concluding Design Exercise: Robotic Warehouse Agent), logging:
- Success/failure (did the agent deliver the item?)
- Episode length (steps taken)
- Final battery level
- Total reward
Analyze and visualize:
- Success rate (percentage of episodes completed)
- Mean/median episode length for successful episodes
- Box plots comparing episode lengths
- Histograms of final battery levels

Copilot Coaching:

Start by asking Copilot to review the greedy agent (Section 1.6: Concluding Design Exercise: Robotic Warehouse Agent) and summarize its structure.
Prompt: "Create a simple reflex agent for the warehouse environment that uses condition-action rules based on current position, goal positions, and whether the robot carries an item. Include all 8 directional rules."
For the analysis script: "Write a function that runs N episodes of an agent, collects statistics (success rate, episode length, battery, reward), and returns a dictionary of results."
For visualization: "Given two dictionaries of agent statistics, create a figure with 3 subplots: (1) bar chart of success rates, (2) box plots of episode lengths, (3) histograms of final battery levels."

Deliverables:

warehouse_agent_reflex.py: reflex agent implementation
compare_agents.py: multi-episode runner and statistical analysis

Problem 1.2 Random Agent Baseline and Performance Spectrum

Objective: Establish a random agent baseline and map out the performance spectrum from random to greedy behavior.

Background: A random agent selects valid actions uniformly at random. This provides a lower bound on performance—any intelligent agent should outperform random behavior. By comparing multiple agents, we can quantify the value of different design choices.

Task:

Implement a random agent (warehouse_agent_random.py):
- At each step, choose a random valid action from [N, S, E, W, PICK, DROP]
- Filter out invalid actions (e.g., can't pick without being at pickup location)
- No internal state, no goal-directed behavior
Create a weighted random agent (warehouse_agent_weighted.py):
- Choose actions with probabilities that favor moving toward the goal
- Example: if goal is North, P(North) = 0.4, P(South) = 0.1, P(East/West) = 0.2 each
- Update probabilities when goal changes (pickup → dropoff)
- This represents a "biased random walk"
Run 100 episodes for three agents: random, weighted-random, greedy
- Use the same random seed sequence for fair comparison
- Log the same metrics as problem 1.1
Create a performance spectrum plot:
- X-axis: agent type (random, weighted, greedy)
- Y-axis: mean episode length (with error bars showing ±1 standard deviation)
- Include success rate annotations on each bar
- Title: "Performance Spectrum: Random to Intelligent Agents"

Copilot Coaching:

For random agent: "Create a random agent that selects uniformly from valid actions at each step. Valid actions are moves that don't hit walls, PICK only at pickup when not carrying, and DROP only at dropoff when carrying."
For weighted agent: "Modify the random agent to use weighted probabilities favoring movement toward the goal. Compute direction to goal, then assign P=0.4 to actions moving closer, P=0.1 to actions moving farther, P=0.2 to neutral actions."
For plotting: "Create a bar chart with error bars showing mean episode length ± 1 std for three agents. Add success rate percentages as text annotations above each bar."

Deliverables:

warehouse_agent_random.py: pure random agent
warehouse_agent_weighted.py: biased random agent
performance_spectrum.py: multi-agent comparison script
spectrum_report.pdf: 2-page report with performance spectrum plot and discussion of how "intelligence" emerges from random to greedy behavior

Problem 1.3 Agent Design Challenge

Objective: Design, implement, and evaluate a novel agent that outperforms the greedy baseline of Section 1.6: Concluding Design Exercise: Robotic Warehouse Agent.

Background: The greedy Manhattan agent often gets stuck or makes suboptimal choices. Your task is to design a better agent using principles from Chapter 1, leveraging Copilot to accelerate implementation.

Task:

Design a novel agent that improves on the greedy baseline. Possible approaches:
- Lookahead agent: evaluate 2-step sequences and choose the best
- Wall-avoiding agent: penalize moves near walls to reduce stuck situations
- Energy-aware agent: prefer paths that conserve battery when battery is low
- History-based agent: track success/failure patterns and adapt thresholds
- Or invent your own approach!
Document your design before coding (1 page):
- What is the agent's decision-making strategy?
- What internal state does it maintain (if any)?
- What heuristics or rules guide action selection?
- Why should this outperform the greedy baseline?
Implement your agent using Copilot:
- Create warehouse_agent_custom.py
- Use Copilot to draft the core logic, but review and refine all code
- Add docstrings explaining key functions
- Include at least one improvement over raw Copilot output (document what you changed and why)
Evaluate rigorously:
- Run 200 episodes each for: random, greedy, your agent
- Use 5 different warehouse layouts (vary obstacle density, grid size)
- Compute aggregated statistics across all layouts
- Create a performance dashboard: 2×2 grid of plots showing success rate, mean episode length, mean battery remaining, mean reward
Ablation study (optional, +10% extra credit):
- Identify one key component of your agent (e.g., lookahead depth, wall penalty weight)
- Run experiments varying that component
- Plot performance vs. component value
- Discuss the tradeoff

Copilot Coaching:

Design phase: Ask Copilot to brainstorm ideas: "What are 5 ways to improve a greedy Manhattan agent in a grid world with obstacles and battery constraints?"
Implementation: Be specific with prompts: "Implement a 2-step lookahead agent: for each valid action, simulate the next step, evaluate the resulting position's Manhattan distance to goal plus battery cost, and choose the action with the best expected outcome."
Refinement: After Copilot generates code, ask: "Review this agent code and suggest 3 improvements for robustness or efficiency."
Evaluation harness: "Create a function that runs N episodes of an agent on M different warehouse layouts, aggregates statistics, and returns a DataFrame with columns [agent, layout, episode, success, steps, battery, reward]."

Deliverables:

agent_design.pdf: 1-page design document (written before coding)
warehouse_agent_custom.py: your novel agent implementation
copilot_log.md: brief log showing (1) key prompts used, (2) one example of improving Copilot's output
evaluation_dashboard.py: multi-layout evaluation script
final_report.pdf: 4-5 page report with performance dashboard, comparison to baselines, discussion of design choices, and reflection on using Copilot as a coding assistant

Grading Rubric:

Design document clarity (15%)
Agent implementation correctness (25%)
Evaluation rigor (20%)
Performance improvement over baseline (20%)
Copilot usage and refinement (10%)
Report quality (10%)

Drawing Tools

Table of Contents

1.7 Problems

Problem 1.1 Simple Reflex Agent Solution Available Jan 30, 2026 at 08:00 PM

Problem 1.2 Random Agent Baseline and Performance Spectrum

Problem 1.3 Agent Design Challenge

Problem 1.1 Simple Reflex Agent