Given a knowledge base in first-order logic, how do we derive conclusions? This section presents two practical inference strategies: forward chaining (data-driven) and backward chaining (goal-driven) (Russell and Norvig, 2020, ch. 9). Both build on a fundamental operation called unification that matches logical expressions.
One approach to FOL inference is to convert it to propositional logic and use propositional methods.
A ground term contains no variables. A ground sentence is a sentence with no variables—equivalent to a propositional sentence.
Propositionalization replaces universally quantified sentences with all their ground instances:
From: \(\forall l\ \text{Damaged}(l) \Rightarrow \neg\text{Safe}(l)\)
Generate for each constant in the domain: \[\text{Damaged}(L_{1,1}) \Rightarrow \neg\text{Safe}(L_{1,1})\] \[\text{Damaged}(L_{1,2}) \Rightarrow \neg\text{Safe}(L_{1,2})\] \[\vdots\]
This reduces FOL to propositional logic, where resolution or other methods apply.
Practical FOL inference avoids full propositionalization by working with variables directly.
The key to variable-based inference is unification: finding substitutions that make two expressions identical.
A substitution \(\theta\) is a mapping from variables to terms. Applying \(\theta\) to an expression replaces each variable with its assigned term.
Example: Let \(\theta = \{x/L_{2,1}, y/L_{1,1}\}\).
Applying to \(\text{Adjacent}(x, y)\) yields \(\text{Adjacent}(L_{2,1}, L_{1,1})\).
Two expressions unify if there exists a substitution \(\theta\) that makes them identical. The most general unifier (MGU) is the simplest such substitution.
Examples:
| Expression 1 | Expression 2 | MGU |
|---|---|---|
| \(\text{Damaged}(x)\) | \(\text{Damaged}(L_{2,1})\) | \(\{x/L_{2,1}\}\) |
| \(\text{Adjacent}(x, L_{1,1})\) | \(\text{Adjacent}(L_{2,1}, y)\) | \(\{x/L_{2,1}, y/L_{1,1}\}\) |
| \(\text{At}(x, L_{1,1})\) | \(\text{At}(y, y)\) | \(\{x/L_{1,1}, y/L_{1,1}\}\) |
| \(\text{At}(x, x)\) | \(\text{At}(L_{1,1}, L_{2,1})\) | fail (\(x\) cannot be both \(L_{1,1}\) and \(L_{2,1}\)) |
The unification algorithm systematically finds the MGU or reports failure. It handles nested terms and must avoid occurs check failures (e.g., \(x\) cannot unify with \(f(x)\)).
The core idea is recursive decomposition:
\(\text{UNIFY}(\alpha, \beta)\) returns the most general unifier of \(\alpha\) and \(\beta\), or \(\text{failure}\) if they don't unify.
With unification, we can generalize modus ponens to handle variables.
For atomic sentences \(p_1, p_2, \ldots, p_n\) and \(q\), and substitution \(\theta\):
From sentences \(p_1', p_2', \ldots, p_n'\) and \((p_1 \land p_2 \land \cdots \land p_n) \Rightarrow q\),
if \(\text{UNIFY}(p_i, p_i') = \theta\) for all \(i\), infer \(q\theta\) (apply \(\theta\) to \(q\)).
Example:
Given:
To use generalized modus ponens:
Conclusion: \(\text{Creaking}(L_{2,1})\)
Forward chaining (data-driven inference) starts from known facts and repeatedly applies rules to derive new facts until the goal is reached or no new facts can be derived.
function FORWARD-CHAIN(KB, goal):
repeat:
for each rule (body => head) in KB:
for each substitution θ that unifies body with known facts:
new_fact = head with θ applied
if new_fact not in KB:
add new_fact to KB
if new_fact matches goal:
return θ
until no new facts added
return failure
Knowledge Base:
Goal: What do we know about damaged floors?
Forward chaining:
Iteration 1: Rule 8 fires with facts 4 and a hypothetical \(\text{Damaged}(L_{3,1})\). But forward chaining only applies to known facts, so rule 8 cannot fire yet—we have no \(\text{Damaged}\) facts.
From the absence of creaking at \(L_{1,1}\) (fact 6), together with the full biconditional form of the creaking rule, we can deduce that no adjacent square has damaged floor: \(\neg\text{Damaged}(L_{2,1})\) and \(\neg\text{Damaged}(L_{1,2})\). Note that this reasoning uses the contrapositive of the \(\Leftarrow\) direction of the biconditional, which goes beyond strict definite-clause forward chaining (see note below).
Iteration 2: From \(\text{Creaking}(L_{2,1})\) (fact 7) and the \(\Rightarrow\) direction of the biconditional: some adjacent square to \(L_{2,1}\) has damaged floor. The candidates are \(L_{1,1}\), \(L_{3,1}\), and \(L_{2,2}\). Since \(\neg\text{Damaged}(L_{1,1})\) (fact 10), we conclude: \[\text{Damaged}(L_{3,1}) \lor \text{Damaged}(L_{2,2})\]
Forward chaining continues until no new facts are derivable.
Forward chaining is complete for definite clauses (rules with exactly one positive conclusion). For general FOL, it may not terminate or miss some conclusions.
Advantages:
Disadvantages:
Backward chaining (goal-driven inference) starts from the goal and works backward, finding rules whose conclusions match the goal and recursively proving their premises.
function BACKWARD-CHAIN(KB, goal, θ):
if goal matches a known fact with substitution θ':
return compose(θ, θ')
for each rule (body => head) in KB:
if UNIFY(head, goal) = θ':
combined = compose(θ, θ')
for each conjunct g in body:
result = BACKWARD-CHAIN(KB, g with combined, combined)
if result = failure:
break # try next rule
combined = result
if all conjuncts succeeded:
return combined
return failure
Goal: Is \(L_{2,1}\) safe?
Query: \(\text{Safe}(L_{2,1})\)
Rules and facts:
Backward chaining:
Goal: \(\text{Safe}(L_{2,1})\)
Find rule with matching head: \(\text{Safe}(l)\) unifies with \(\theta = \{l/L_{2,1}\}\)
Subgoals: \(\neg\text{Damaged}(L_{2,1})\) and \(\neg\text{Forklift}(L_{2,1})\)
Subgoal 1: \(\neg\text{Damaged}(L_{2,1})\)
Subgoal 2: \(\neg\text{Forklift}(L_{2,1})\)
Both subgoals satisfied: \(\text{Safe}(L_{2,1})\) is proved
Advantages:
Disadvantages:
The programming language Prolog is the classic embodiment of backward chaining: a Prolog program is a set of facts and rules, and queries are answered by backward chaining through them. In this chapter, we use Z3 instead—an SMT solver that checks entailment via refutation (see Section 3.6: Building a Knowledge-Based Agent for a full introduction). Z3 handles both forward and backward reasoning strategies internally, freeing us from choosing one.
Here is how Z3 answers the same "Is \(L_{2,1}\) safe?" query. Given a solver loaded with the quantified physics rules, adjacency facts, and percepts (\(\neg\text{Creaking}(L_{1,1})\), \(\neg\text{Rumbling}(L_{1,1})\)):
from z3 import Not, unsat
def z3_entails(solver, query):
"""Does the KB entail query? (Proof by refutation.)"""
solver.push()
solver.add(Not(query))
result = solver.check() == unsat
solver.pop()
return result
# ASK: Is L_{2,1} safe?
z3_entails(solver, Safe_fn(loc[(2, 1)])) # TrueIf check() returns unsat, no interpretation can satisfy the KB while making \(\text{Safe}(L_{2,1})\) false—so the KB entails it. This is proof by refutation, equivalent in power to resolution (Section 3.3.18: Resolution and Completeness). The full Z3-based agents in Section 3.6: Building a Knowledge-Based Agent and Section 3.7: Building a FOL Agent with Z3 use this pattern to classify every square as safe, dangerous, or unknown before the robot moves.
| Aspect | Forward Chaining | Backward Chaining |
|---|---|---|
| Direction | Facts → Conclusions | Goal → Subgoals → Facts |
| Triggered by | New data arriving | Specific queries |
| Focus | Derives everything derivable | Only relevant to goal |
| Efficiency | Good for many conclusions | Good for specific queries |
| Use case | Monitoring, alerting | Question answering, planning |
In the Hazardous Warehouse:
A practical agent might use both: forward chain to update beliefs after each percept, then backward chain to answer specific planning queries.
For completeness, we mention that resolution generalizes to FOL.
First-order resolution unifies complementary literals from two clauses and derives the resolvent. Combined with skolemization (eliminating \(\exists\) by introducing constants) and CNF conversion, it provides a complete refutation procedure for FOL.
Example:
Clauses:
Resolve 1 and 2 with \(\{x/L_{2,1}, y/L_{3,1}\}\): \[\neg\text{Damaged}(L_{3,1}) \lor \text{Creaking}(L_{2,1})\]
Resolve with clause 3: \[\text{Creaking}(L_{2,1})\]
Full first-order resolution is powerful but can be slow due to the large search space. Modern theorem provers use more sophisticated strategies. For many applications, forward/backward chaining on restricted rule forms is more practical.
Putting it together, a knowledge-based agent for the Hazardous Warehouse operates as:
Initialize KB with background knowledge (adjacency, percept rules, safety definitions)
Perceive: Receive percepts (creaking, rumbling, beacon, bump, beep)
TELL: Add percept facts to KB
Forward chain: Derive new conclusions about safe/dangerous squares
ASK (backward chain): Query for safe moves
Act: Move, grab package, use shutdown device, or exit
TELL: Record action taken (e.g., \(\text{Visited}(\mathit{CurrentLoc})\))
Repeat until goal achieved or failure
This loop integrates logical reasoning with action selection, enabling the robot to navigate safely through partial information.
First-order entailment is semi-decidable: if \(KB \models \alpha\), a complete procedure will eventually prove it. But if \(KB \not\models \alpha\), the procedure may run forever.
This contrasts with propositional logic, which is decidable (though NP-complete for satisfiability).
For tractable inference, we often restrict to:
The Hazardous Warehouse can be encoded in Datalog (finite locations, no functions), making inference decidable and efficient.
Inference in first-order logic extends propositional methods:
For the Hazardous Warehouse agent:
These techniques form the foundation of rule-based expert systems, Prolog programming, and knowledge representation systems used throughout AI and software engineering.
Comment 3.7 Propositionalization Cost
Propositionalization is complete: if KB \(\models \alpha\), the propositionalized version entails it too. But the number of ground sentences can be huge. For \(n\) constants and predicates of arity \(k\), we may generate \(O(n^k)\) ground atoms. Function symbols make it worse—potentially infinite.