The search algorithms we have studied so far—BFS, A*, and their variants—maintain a search tree or frontier and explore paths from an initial state to a goal. They are systematic, complete, and optimal (with appropriate heuristics). But they are also memory-intensive: storing thousands or millions of nodes can be prohibitive for large state spaces.
For many optimization problems, we don't need the path to a solution—we only need the solution itself. Consider:
Local search algorithms operate on a current state and iteratively move to neighbor states that improve an objective function. They use constant memory (just the current state), can handle enormous state spaces, and are well-suited for optimization problems where any solution can be improved incrementally.
This section covers three foundational local search techniques from Russell & Norvig Chapter 4: hill-climbing (greedy local search), simulated annealing (stochastic escape from local optima), and genetic algorithms (population-based evolutionary search).
Local search maintains a current state \(s\) and repeatedly applies local moves to improve it. The framework has three components:
Unlike path-finding search, local search does not track how we reached \(s\). It only cares about \(s\) itself and its neighbors.
Local search is a search strategy that maintains a single current state and iteratively moves to a neighboring state based on an objective function. It does not maintain a search tree or explored set.
Suppose we want to arrange \(n = 20\) storage racks in a warehouse to minimize travel distance while avoiding overcrowding near the depot. Each rack has a position \((x_i, y_i)\).
State representation: \(s = \{(x_1, y_1), (x_2, y_2), \ldots, (x_{20}, y_{20})\}\) where each rack occupies an integer grid position in a 20×20 warehouse.
Objective function (clearly defined): \[f(s) = \underbrace{\frac{1}{20} \sum_{i=1}^{20} d(\text{depot}, (x_i, y_i))}_{\text{Average travel distance}} + \underbrace{\lambda \cdot |\{i : d(\text{depot}, (x_i, y_i)) < 5\}|}_{\text{Congestion penalty}}\]
where: - \(d(\text{depot}, (x_i, y_i))\) is Manhattan distance from depot at \((10, 10)\) to rack \(i\) - \(\lambda = 2.0\) is the congestion weight (tune to prefer spreading over tight clustering) - The second term penalizes clustering too many racks near the depot (within distance 5)
This formulation creates a trade-off: moving all racks directly adjacent to the depot minimizes travel distance but incurs severe congestion penalty. The optimal layout forms a halo or ring of racks around the depot—far from trivial to discover through greedy search.
Neighborhood structure: - Swap two rack positions - Move one rack by \(\pm 1\) in \(x\) or \(y\) direction (staying within bounds)
A systematic search algorithm (BFS, A*) would search a space of at least \(20! \approx 10^{18}\) possible permutations of rack positions—far too many. Local search starts with a random or heuristic initial placement and iteratively swaps or shifts racks to reduce \(f(s)\).
Hill-climbing (also called greedy local search) is the simplest local search algorithm: always move to the best neighbor.
Hill-climbing repeatedly selects the neighbor with the best objective function value (steepest ascent or descent) and moves there. It terminates when no neighbor improves the current state.
def hill_climbing(problem, max_iterations=1000):
current = problem.initial_state()
current_value = problem.objective(current)
for iteration in range(max_iterations):
neighbors = problem.get_neighbors(current)
if not neighbors:
break
# Find best neighbor
best_neighbor = None
best_value = current_value
for neighbor in neighbors:
neighbor_value = problem.objective(neighbor)
if neighbor_value < best_value: # assuming minimization
best_neighbor = neighbor
best_value = neighbor_value
# If no improvement, stop (local optimum reached)
if best_neighbor is None:
break
current = best_neighbor
current_value = best_value
return currentKey features of hill-climbing include the following:
Hill-climbing fails in three scenarios:
Stochastic hill-climbing: Pick a random neighbor from the set of uphill moves (improving neighbors). This adds randomness to escape local optima.
First-choice hill-climbing: Generate neighbors randomly until one improves, then move immediately without evaluating all neighbors. Useful when the neighborhood is huge.
Sideways moves: Allow moves to neighbors with equal value to escape shoulders. Limit the number of consecutive sideways moves to avoid infinite loops on plateaus.
Random-restart hill-climbing: Run hill-climbing multiple times from random initial states, keep the best result. If each run has probability \(p\) of success, expected number of restarts is \(1/p\). Advantageous when local optima are common.
Simulated annealing escapes local optima by occasionally accepting worse moves with a probability that decreases over time. The name comes from metallurgy: annealing involves heating metal to allow atoms to escape local energy minima, then gradually cooling to settle into a stable (low-energy) configuration.
Simulated annealing is a stochastic local search algorithm that accepts moves to worse neighbors with probability \(e^{-\Delta E / T}\), where \(\Delta E\) is the increase in objective value and \(T\) is a "temperature" parameter that decreases over time.
import random
import math
def simulated_annealing(problem, schedule):
"""
schedule: function that maps time (iteration) to temperature T
"""
current = problem.initial_state()
current_value = problem.objective(current)
for t in range(1, problem.max_iterations):
T = schedule(t)
if T == 0:
return current
# Pick a random neighbor
neighbor = random.choice(problem.get_neighbors(current))
neighbor_value = problem.objective(neighbor)
delta_E = neighbor_value - current_value
# Accept if better, or with probability exp(-ΔE/T) if worse
# Note: for minimization, ΔE > 0 means worse, so accept with prob exp(-ΔE/T)
if delta_E < 0 or random.random() < math.exp(-delta_E / T):
current = neighbor
current_value = neighbor_value
return currentKey parameters:
Why it works: By accepting worse moves early, simulated annealing escapes local optima. As temperature decreases, it behaves more like hill-climbing, converging to a (hopefully better) local optimum.
Theoretical guarantee: With a sufficiently slow cooling schedule, simulated annealing converges to the global optimum with probability approaching 1 as time \(\to \infty\). This makes it theoretically complete. In practice, such slow schedules are impractical, so simulated annealing is used as a heuristic that often improves over hill-climbing but has no finite-time optimality guarantee. Practical schedules cool faster (exponential decay) and find good but not guaranteed optimal solutions.
Warehouse example: When optimizing rack placement, early iterations might move racks to seemingly worse positions (higher average distance) to escape clusters. Later, the algorithm fine-tunes positions within the better region discovered.
Rule of thumb: Start with \(T_0\) such that \(e^{-\Delta E_{\text{avg}} / T_0} \approx 0.8\) (accept ~80% of worse moves initially). Tune empirically.
Hill-climbing and simulated annealing operate on a single state. Genetic algorithms (GAs) maintain a population of candidate solutions and evolve them over generations using biologically-inspired operators: selection, crossover, and mutation.
A genetic algorithm is a population-based search algorithm that evolves a set of candidate solutions using selection (favoring fitter individuals), crossover (combining solutions), and mutation (random changes).
def genetic_algorithm(problem, population_size=100, generations=500,
mutation_rate=0.01, crossover_rate=0.7):
population = [problem.random_individual() for _ in range(population_size)]
for generation in range(generations):
# Evaluate fitness for all individuals
fitness = [problem.fitness(ind) for ind in population]
# Selection: pick parents proportional to fitness
parents = select(population, fitness, num_parents=population_size)
# Create next generation via crossover and mutation
next_population = []
for i in range(0, population_size, 2):
parent1, parent2 = parents[i], parents[i+1]
# Crossover
if random.random() < crossover_rate:
child1, child2 = crossover(parent1, parent2)
else:
child1, child2 = parent1, parent2
# Mutation
child1 = mutate(child1, mutation_rate)
child2 = mutate(child2, mutation_rate)
next_population.extend([child1, child2])
population = next_population
# Return best individual
return max(population, key=problem.fitness)Selection: Common strategies:
Crossover: Combine two parents to create offspring
Single-point crossover: Pick a random split point, swap segments
Parent1: [1 0 1 | 1 0 0 1]
Parent2: [0 1 0 | 0 1 1 0]
Child1: [1 0 1 | 0 1 1 0]
Child2: [0 1 0 | 1 0 0 1]Two-point crossover: Swap the middle segment between two points
Uniform crossover: Each gene is randomly inherited from either parent
Mutation: Randomly alter genes to maintain diversity
Suppose we have 10 robots and 50 pickup tasks, and want to assign tasks to robots to minimize total completion time.
GAs excel here because crossover can combine good sub-assignments from two parents (e.g., one parent efficiently assigns tasks in zone A, another in zone B).
Strengths:
Limitations:
When to use GAs: Large combinatorial optimization problems where domain knowledge for heuristics is limited, or where solutions have natural modular structure (e.g., scheduling, routing, design).
| Algorithm | Memory | Escape Local Optima? | Parallelizable? | Best For |
|---|---|---|---|---|
| Hill-climbing | \(O(1)\) | No | No | Convex objectives, quick solutions |
| Simulated annealing | \(O(1)\) | Yes (probabilistic) | No | Non-convex, moderate size |
| Genetic algorithm | \(O(n \cdot \text{pop})\) | Yes (population) | Yes | Large combinatorial, structure |
Use local search when:
Use systematic search (A*, UCS) when:
Example contrast:
Local search algorithms—hill-climbing, simulated annealing, and genetic algorithms—trade completeness and optimality guarantees for memory efficiency and scalability. They operate on a current state (or population) and iteratively improve via local moves. Hill-climbing is fast but gets stuck; simulated annealing escapes via probabilistic bad moves; genetic algorithms leverage population diversity and crossover.
These techniques are essential for large-scale optimization in engineering: warehouse layouts, manufacturing schedules, robot task assignment, and design optimization. They complement systematic search (A*, UCS) by handling problems where the state space is too large and only the final solution matters.
The next section, Constraints and Scheduling, introduces Constraint Satisfaction Problems (CSPs), which combine search with constraint propagation to efficiently solve assignment and scheduling problems. Local search can also be applied to CSPs (e.g., min-conflicts algorithm), creating a bridge between the two frameworks.
Optimization vs. Satisfaction
Pure optimization problems have an objective function to maximize or minimize, with no hard constraints (any state is a valid solution, just with varying quality). Constraint satisfaction problems (CSPs) have hard constraints that must be satisfied (invalid states exist). Local search can handle both: for optimization, evaluate objective directly; for CSPs, use objective = number of satisfied constraints, or apply local search after finding any feasible solution to optimize quality.