2.4 Local Search and Optimization

The search algorithms we have studied so far—BFS, A*, and their variants—maintain a search tree or frontier and explore paths from an initial state to a goal. They are systematic, complete, and optimal (with appropriate heuristics). But they are also memory-intensive: storing thousands or millions of nodes can be prohibitive for large state spaces.

For many optimization problems, we don't need the path to a solution—we only need the solution itself. Consider:

Warehouse layout optimization: arrange racks to minimize average robot travel distance. We care about the final layout, not how we arrived at it.
Robot path smoothing: given a collision-free path from A*, post-process it to minimize turns and energy. The intermediate configurations during smoothing are irrelevant.
Manufacturing schedule tuning: adjust job assignments to minimize makespan (total completion time). The sequence of trial schedules doesn't matter.

Local search algorithms operate on a current state and iteratively move to neighbor states that improve an objective function. They use constant memory (just the current state), can handle enormous state spaces, and are well-suited for optimization problems where any solution can be improved incrementally.

This section covers three foundational local search techniques from Russell & Norvig Chapter 4: hill-climbing (greedy local search), simulated annealing (stochastic escape from local optima), and genetic algorithms (population-based evolutionary search).

Optimization vs. Satisfaction

Pure optimization problems have an objective function to maximize or minimize, with no hard constraints (any state is a valid solution, just with varying quality). Constraint satisfaction problems (CSPs) have hard constraints that must be satisfied (invalid states exist). Local search can handle both: for optimization, evaluate objective directly; for CSPs, use objective = number of satisfied constraints, or apply local search after finding any feasible solution to optimize quality.

2.4.1 The Local Search Framework

Local search maintains a current state \(s\) and repeatedly applies local moves to improve it. The framework has three components:

State representation: How to encode candidate solutions (e.g., rack positions as \((x, y)\) coordinates)
Objective function \(f(s)\): Measures solution quality (minimize travel distance, maximize throughput)
Neighborhood structure \(N(s)\): Defines which states are "neighbors" (e.g., swap two racks, shift one rack by 1 meter)

Unlike path-finding search, local search does not track how we reached \(s\). It only cares about \(s\) itself and its neighbors.

Definition 2.22 Local Search

Local search is a search strategy that maintains a single current state and iteratively moves to a neighboring state based on an objective function. It does not maintain a search tree or explored set.

2.4.1.1 Example: Warehouse Rack Placement

Suppose we want to arrange \(n = 20\) storage racks in a warehouse to minimize travel distance while avoiding overcrowding near the depot. Each rack has a position \((x_i, y_i)\).

State representation: \(s = \{(x_1, y_1), (x_2, y_2), \ldots, (x_{20}, y_{20})\}\) where each rack occupies an integer grid position in a 20×20 warehouse.

Objective function (clearly defined): \[f(s) = \underbrace{\frac{1}{20} \sum_{i=1}^{20} d(\text{depot}, (x_i, y_i))}_{\text{Average travel distance}} + \underbrace{\lambda \cdot |\{i : d(\text{depot}, (x_i, y_i)) < 5\}|}_{\text{Congestion penalty}}\]

where: - \(d(\text{depot}, (x_i, y_i))\) is Manhattan distance from depot at \((10, 10)\) to rack \(i\) - \(\lambda = 2.0\) is the congestion weight (tune to prefer spreading over tight clustering) - The second term penalizes clustering too many racks near the depot (within distance 5)

This formulation creates a trade-off: moving all racks directly adjacent to the depot minimizes travel distance but incurs severe congestion penalty. The optimal layout forms a halo or ring of racks around the depot—far from trivial to discover through greedy search.

Neighborhood structure: - Swap two rack positions - Move one rack by \(\pm 1\) in \(x\) or \(y\) direction (staying within bounds)

A systematic search algorithm (BFS, A*) would search a space of at least \(20! \approx 10^{18}\) possible permutations of rack positions—far too many. Local search starts with a random or heuristic initial placement and iteratively swaps or shifts racks to reduce \(f(s)\).

2.4.2 Hill-Climbing

Hill-climbing (also called greedy local search) is the simplest local search algorithm: always move to the best neighbor.

Definition 2.23 Hill-Climbing

Hill-climbing repeatedly selects the neighbor with the best objective function value (steepest ascent or descent) and moves there. It terminates when no neighbor improves the current state.

2.4.2.1 Algorithm

def hill_climbing(problem, max_iterations=1000):
    current = problem.initial_state()
    current_value = problem.objective(current)
    
    for iteration in range(max_iterations):
        neighbors = problem.get_neighbors(current)
        if not neighbors:
            break
        
        # Find best neighbor
        best_neighbor = None
        best_value = current_value
        for neighbor in neighbors:
            neighbor_value = problem.objective(neighbor)
            if neighbor_value < best_value:  # assuming minimization
                best_neighbor = neighbor
                best_value = neighbor_value
        
        # If no improvement, stop (local optimum reached)
        if best_neighbor is None:
            break
        
        current = best_neighbor
        current_value = best_value
    
    return current

Key features of hill-climbing include the following:

Greedy: Always picks the best immediate improvement
Memory-efficient: Only stores the current state and its neighbors
Fast: Each iteration only evaluates neighbors, not the entire search space

2.4.2.2 Limitations

Hill-climbing fails in three scenarios:

Local maxima (or minima): A state better than all neighbors but worse than the global optimum. Hill-climbing gets stuck.
Plateaus (or flat local maxima): A flat region where many neighbors have the same objective value. Hill-climbing may make no progress. A shoulder is a plateau with an uphill edge—sideways moves can eventually lead to improvement.
Ridges: Narrow optimal regions where improvement requires moving in multiple dimensions simultaneously. Hill-climbing overshoots or oscillates.

2.4.2.3 Variants

Stochastic hill-climbing: Pick a random neighbor from the set of uphill moves (improving neighbors). This adds randomness to escape local optima.

First-choice hill-climbing: Generate neighbors randomly until one improves, then move immediately without evaluating all neighbors. Useful when the neighborhood is huge.

Sideways moves: Allow moves to neighbors with equal value to escape shoulders. Limit the number of consecutive sideways moves to avoid infinite loops on plateaus.

Random-restart hill-climbing: Run hill-climbing multiple times from random initial states, keep the best result. If each run has probability \(p\) of success, expected number of restarts is \(1/p\). Advantageous when local optima are common.

2.4.3 Simulated Annealing

Simulated annealing escapes local optima by occasionally accepting worse moves with a probability that decreases over time. The name comes from metallurgy: annealing involves heating metal to allow atoms to escape local energy minima, then gradually cooling to settle into a stable (low-energy) configuration.

Definition 2.24 Simulated Annealing

Simulated annealing is a stochastic local search algorithm that accepts moves to worse neighbors with probability \(e^{-\Delta E / T}\), where \(\Delta E\) is the increase in objective value and \(T\) is a "temperature" parameter that decreases over time.

2.4.3.1 Algorithm

import random
import math

def simulated_annealing(problem, schedule):
    """
    schedule: function that maps time (iteration) to temperature T
    """
    current = problem.initial_state()
    current_value = problem.objective(current)
    
    for t in range(1, problem.max_iterations):
        T = schedule(t)
        if T == 0:
            return current
        
        # Pick a random neighbor
        neighbor = random.choice(problem.get_neighbors(current))
        neighbor_value = problem.objective(neighbor)
        delta_E = neighbor_value - current_value
        
        # Accept if better, or with probability exp(-ΔE/T) if worse
        # Note: for minimization, ΔE > 0 means worse, so accept with prob exp(-ΔE/T)
        if delta_E < 0 or random.random() < math.exp(-delta_E / T):
            current = neighbor
            current_value = neighbor_value
    
    return current

Key parameters:

Temperature schedule \(T(t)\): Controls exploration vs. exploitation
- High \(T\) (early): Accept many bad moves, explore broadly
- Low \(T\) (late): Reject most bad moves, converge to local optimum
- Common schedule: \(T(t) = T_0 / \log(t + 1)\) or exponential decay \(T(t) = T_0 \cdot \alpha^t\)
Acceptance probability: \(P(\text{accept worse move}) = e^{-\Delta E / T}\)
- Small \(\Delta E\) (slightly worse): higher chance of acceptance
- Large \(\Delta E\) (much worse): lower chance
- High \(T\): accept almost anything
- Low \(T\): accept only small downgrades

2.4.3.2 Intuition and Properties

Why it works: By accepting worse moves early, simulated annealing escapes local optima. As temperature decreases, it behaves more like hill-climbing, converging to a (hopefully better) local optimum.

Theoretical guarantee: With a sufficiently slow cooling schedule, simulated annealing converges to the global optimum with probability approaching 1 as time \(\to \infty\). This makes it theoretically complete. In practice, such slow schedules are impractical, so simulated annealing is used as a heuristic that often improves over hill-climbing but has no finite-time optimality guarantee. Practical schedules cool faster (exponential decay) and find good but not guaranteed optimal solutions.

Warehouse example: When optimizing rack placement, early iterations might move racks to seemingly worse positions (higher average distance) to escape clusters. Later, the algorithm fine-tunes positions within the better region discovered.

2.4.3.3 Choosing a Temperature Schedule

Linear: \(T(t) = T_0 - k \cdot t\) (stops when \(T = 0\))
Exponential: \(T(t) = T_0 \cdot \alpha^t\), where \(0 < \alpha < 1\) (e.g., \(\alpha = 0.95\))
Logarithmic: \(T(t) = T_0 / \log(t + 1)\) (slowest, theoretical guarantee)

Rule of thumb: Start with \(T_0\) such that \(e^{-\Delta E_{\text{avg}} / T_0} \approx 0.8\) (accept ~80% of worse moves initially). Tune empirically.

2.4.4 Genetic Algorithms

Hill-climbing and simulated annealing operate on a single state. Genetic algorithms (GAs) maintain a population of candidate solutions and evolve them over generations using biologically-inspired operators: selection, crossover, and mutation.

Definition 2.25 Genetic Algorithm

A genetic algorithm is a population-based search algorithm that evolves a set of candidate solutions using selection (favoring fitter individuals), crossover (combining solutions), and mutation (random changes).

2.4.4.1 Key Concepts

Individual: A candidate solution encoded as a string (e.g., binary, real-valued, permutation)
Population: A set of individuals
Fitness: Objective function value (higher fitness = better solution)
Selection: Choose individuals for reproduction based on fitness
Crossover: Combine two parents to create offspring
Mutation: Randomly alter offspring to introduce diversity

2.4.4.2 Algorithm

def genetic_algorithm(problem, population_size=100, generations=500, 
                     mutation_rate=0.01, crossover_rate=0.7):
    population = [problem.random_individual() for _ in range(population_size)]
    
    for generation in range(generations):
        # Evaluate fitness for all individuals
        fitness = [problem.fitness(ind) for ind in population]
        
        # Selection: pick parents proportional to fitness
        parents = select(population, fitness, num_parents=population_size)
        
        # Create next generation via crossover and mutation
        next_population = []
        for i in range(0, population_size, 2):
            parent1, parent2 = parents[i], parents[i+1]
            
            # Crossover
            if random.random() < crossover_rate:
                child1, child2 = crossover(parent1, parent2)
            else:
                child1, child2 = parent1, parent2
            
            # Mutation
            child1 = mutate(child1, mutation_rate)
            child2 = mutate(child2, mutation_rate)
            
            next_population.extend([child1, child2])
        
        population = next_population
    
    # Return best individual
    return max(population, key=problem.fitness)

2.4.4.3 Genetic Operators

Selection: Common strategies:

Fitness-proportionate selection ("roulette wheel"): Probability of selection proportional to fitness. Can lead to premature convergence if a few individuals dominate.
Tournament selection: Pick \(k\) individuals randomly, choose the fittest among them. More diversity than roulette wheel.
Rank selection: Sort by fitness, select based on rank (not absolute fitness). Avoids premature convergence when fitness varies widely.
Elitism: Always retain some top-performing individuals from one generation to the next. Prevents loss of best solutions due to crossover/mutation randomness.

Crossover: Combine two parents to create offspring

Single-point crossover: Pick a random split point, swap segments

Parent1: [1 0 1 | 1 0 0 1]
Parent2: [0 1 0 | 0 1 1 0]
Child1:  [1 0 1 | 0 1 1 0]
Child2:  [0 1 0 | 1 0 0 1]

Two-point crossover: Swap the middle segment between two points
Uniform crossover: Each gene is randomly inherited from either parent

Mutation: Randomly alter genes to maintain diversity

Bit flip: For binary strings, flip each bit with probability \(p_m\)
Gaussian mutation: For real-valued genes, add Gaussian noise
Permutation swap: For permutation problems (e.g., traveling salesman), swap two elements

2.4.4.4 Warehouse Application: Robot Task Assignment

Suppose we have 10 robots and 50 pickup tasks, and want to assign tasks to robots to minimize total completion time.

Individual: A sequence of 50 task IDs, implicitly partitioned among robots (first \(n_1\) tasks to robot 1, next \(n_2\) to robot 2, etc.)
Fitness: Negative of makespan (maximize fitness = minimize makespan)
Crossover: Combine two task sequences, preserving some structure
Mutation: Swap two tasks or reassign one task to a different robot

GAs excel here because crossover can combine good sub-assignments from two parents (e.g., one parent efficiently assigns tasks in zone A, another in zone B).

2.4.4.5 Strengths and Limitations

Strengths:

Population diversity: Less prone to local optima than hill-climbing
Crossover exploits structure: Can combine beneficial traits from multiple solutions
Parallelizable: Fitness evaluation is independent for each individual

Limitations:

Slow convergence: Requires many generations and fitness evaluations
Encoding sensitivity: Performance depends heavily on how solutions are represented
Hyperparameter tuning: Population size, mutation rate, crossover rate must be tuned
No theoretical guarantees: Heuristic, not guaranteed to find global optimum

When to use GAs: Large combinatorial optimization problems where domain knowledge for heuristics is limited, or where solutions have natural modular structure (e.g., scheduling, routing, design).

2.4.5 Comparison of Local Search Methods

Algorithm	Memory	Escape Local Optima?	Parallelizable?	Best For
Hill-climbing	\(O(1)\)	No	No	Convex objectives, quick solutions
Simulated annealing	\(O(1)\)	Yes (probabilistic)	No	Non-convex, moderate size
Genetic algorithm	\(O(n \cdot \text{pop})\)	Yes (population)	Yes	Large combinatorial, structure

2.4.6 When to Use Local Search vs. Systematic Search

Use local search when:

The state space is enormous (intractable for A*, BFS)
The path to a solution is irrelevant (optimization, not pathfinding)
Any solution can be iteratively improved (e.g., scheduling, layout, tuning)
Memory is limited

Use systematic search (A*, UCS) when:

You need the optimal solution with guarantees
The state space is tractable (millions, not billions of states)
The path itself matters (robot navigation, action sequence)
Completeness is essential (must find a solution if one exists)

Example contrast:

Robot pathfinding: Use A* to find collision-free path from depot to rack. Optimality and path matter.
Warehouse layout: Use local search (simulated annealing, GA) to arrange racks. Only final layout matters, state space is huge (\(20!\) rack permutations).

2.4.7 Summary and Connections

Local search algorithms—hill-climbing, simulated annealing, and genetic algorithms—trade completeness and optimality guarantees for memory efficiency and scalability. They operate on a current state (or population) and iteratively improve via local moves. Hill-climbing is fast but gets stuck; simulated annealing escapes via probabilistic bad moves; genetic algorithms leverage population diversity and crossover.

These techniques are essential for large-scale optimization in engineering: warehouse layouts, manufacturing schedules, robot task assignment, and design optimization. They complement systematic search (A*, UCS) by handling problems where the state space is too large and only the final solution matters.

The next section, Constraints and Scheduling, introduces Constraint Satisfaction Problems (CSPs), which combine search with constraint propagation to efficiently solve assignment and scheduling problems. Local search can also be applied to CSPs (e.g., min-conflicts algorithm), creating a bridge between the two frameworks.

Drawing Tools

Table of Contents