Press Ctrl+D to draw

Drawing Tools

Log in for saved annotations

1px

3.5  Inference in First-Order Logic

Given a knowledge base in first-order logic, how do we derive conclusions? This section presents two practical inference strategies: forward chaining (data-driven) and backward chaining (goal-driven) (Russell and Norvig, 2020, ch. 9). Both build on a fundamental operation called unification that matches logical expressions.

3.5.1 From FOL to Propositional Inference

One approach to FOL inference is to convert it to propositional logic and use propositional methods.

Definition 3.30  Ground Term and Sentence

A ground term contains no variables. A ground sentence is a sentence with no variables—equivalent to a propositional sentence.

Propositionalization replaces universally quantified sentences with all their ground instances:

From: \(\forall l\ \text{Damaged}(l) \Rightarrow \neg\text{Safe}(l)\)

Generate for each constant in the domain: \[\text{Damaged}(L_{1,1}) \Rightarrow \neg\text{Safe}(L_{1,1})\] \[\text{Damaged}(L_{1,2}) \Rightarrow \neg\text{Safe}(L_{1,2})\] \[\vdots\]

This reduces FOL to propositional logic, where resolution or other methods apply.

Comment 3.7  Propositionalization Cost

Propositionalization is complete: if KB \(\models \alpha\), the propositionalized version entails it too. But the number of ground sentences can be huge. For \(n\) constants and predicates of arity \(k\), we may generate \(O(n^k)\) ground atoms. Function symbols make it worse—potentially infinite.

Practical FOL inference avoids full propositionalization by working with variables directly.

3.5.2 Unification

The key to variable-based inference is unification: finding substitutions that make two expressions identical.

Definition 3.31  Substitution

A substitution \(\theta\) is a mapping from variables to terms. Applying \(\theta\) to an expression replaces each variable with its assigned term.

Example: Let \(\theta = \{x/L_{2,1}, y/L_{1,1}\}\).

Applying to \(\text{Adjacent}(x, y)\) yields \(\text{Adjacent}(L_{2,1}, L_{1,1})\).

Definition 3.32  Unification

Two expressions unify if there exists a substitution \(\theta\) that makes them identical. The most general unifier (MGU) is the simplest such substitution.

Examples:

Unification examples
Expression 1 Expression 2 MGU
\(\text{Damaged}(x)\) \(\text{Damaged}(L_{2,1})\) \(\{x/L_{2,1}\}\)
\(\text{Adjacent}(x, L_{1,1})\) \(\text{Adjacent}(L_{2,1}, y)\) \(\{x/L_{2,1}, y/L_{1,1}\}\)
\(\text{At}(x, L_{1,1})\) \(\text{At}(y, y)\) \(\{x/L_{1,1}, y/L_{1,1}\}\)
\(\text{At}(x, x)\) \(\text{At}(L_{1,1}, L_{2,1})\) fail (\(x\) cannot be both \(L_{1,1}\) and \(L_{2,1}\))

The unification algorithm systematically finds the MGU or reports failure. It handles nested terms and must avoid occurs check failures (e.g., \(x\) cannot unify with \(f(x)\)).

3.5.2.1 Unification Algorithm

The core idea is recursive decomposition:

  1. If both expressions are identical constants or variables, succeed with empty substitution
  2. If one is a variable \(v\), substitute \(v\) with the other (checking \(v\) doesn't occur in it)
  3. If both are compound expressions with the same predicate/function, unify arguments pairwise
  4. Otherwise, fail

Definition 3.33  UNIFY Function

\(\text{UNIFY}(\alpha, \beta)\) returns the most general unifier of \(\alpha\) and \(\beta\), or \(\text{failure}\) if they don't unify.

3.5.3 Generalized Modus Ponens

With unification, we can generalize modus ponens to handle variables.

Definition 3.34  Generalized Modus Ponens

For atomic sentences \(p_1, p_2, \ldots, p_n\) and \(q\), and substitution \(\theta\):

From sentences \(p_1', p_2', \ldots, p_n'\) and \((p_1 \land p_2 \land \cdots \land p_n) \Rightarrow q\),

if \(\text{UNIFY}(p_i, p_i') = \theta\) for all \(i\), infer \(q\theta\) (apply \(\theta\) to \(q\)).

Example:

Given:

  • Fact: \(\text{Damaged}(L_{3,1})\)
  • Fact: \(\text{Adjacent}(L_{2,1}, L_{3,1})\)
  • Rule: \(\forall l, l'\ (\text{Adjacent}(l, l') \land \text{Damaged}(l')) \Rightarrow \text{Creaking}(l)\)

To use generalized modus ponens:

  1. Match \(\text{Adjacent}(l, l')\) with \(\text{Adjacent}(L_{2,1}, L_{3,1})\): \(\theta_1 = \{l/L_{2,1}, l'/L_{3,1}\}\)
  2. Match \(\text{Damaged}(l')\) under \(\theta_1\) with \(\text{Damaged}(L_{3,1})\): consistent
  3. Apply \(\theta_1\) to consequent: \(\text{Creaking}(L_{2,1})\)

Conclusion: \(\text{Creaking}(L_{2,1})\)

3.5.4 Forward Chaining

Definition 3.35  Forward Chaining

Forward chaining (data-driven inference) starts from known facts and repeatedly applies rules to derive new facts until the goal is reached or no new facts can be derived.

3.5.4.1 Algorithm Sketch

function FORWARD-CHAIN(KB, goal):
    repeat:
        for each rule (body => head) in KB:
            for each substitution θ that unifies body with known facts:
                new_fact = head with θ applied
                if new_fact not in KB:
                    add new_fact to KB
                    if new_fact matches goal:
                        return θ
    until no new facts added
    return failure

3.5.4.2 Forward Chaining Example

Knowledge Base:

  1. \(\text{Adjacent}(L_{1,1}, L_{2,1})\)
  2. \(\text{Adjacent}(L_{1,1}, L_{1,2})\)
  3. \(\text{Adjacent}(L_{2,1}, L_{1,1})\)
  4. \(\text{Adjacent}(L_{2,1}, L_{3,1})\)
  5. \(\text{Adjacent}(L_{2,1}, L_{2,2})\)
  6. \(\neg\text{Creaking}(L_{1,1})\)
  7. \(\text{Creaking}(L_{2,1})\)
  8. \(\forall l, l'\ \text{Adjacent}(l, l') \land \text{Damaged}(l') \Rightarrow \text{Creaking}(l)\)
  9. \(\forall l\ \text{Safe}(l) \Leftrightarrow \neg\text{Damaged}(l) \land \neg\text{Forklift}(l)\)
  10. \(\neg\text{Damaged}(L_{1,1})\) (starting square is safe)

Goal: What do we know about damaged floors?

Forward chaining:

Iteration 1: Rule 8 fires with facts 4 and a hypothetical \(\text{Damaged}(L_{3,1})\). But forward chaining only applies to known facts, so rule 8 cannot fire yet—we have no \(\text{Damaged}\) facts.

From the absence of creaking at \(L_{1,1}\) (fact 6), together with the full biconditional form of the creaking rule, we can deduce that no adjacent square has damaged floor: \(\neg\text{Damaged}(L_{2,1})\) and \(\neg\text{Damaged}(L_{1,2})\). Note that this reasoning uses the contrapositive of the \(\Leftarrow\) direction of the biconditional, which goes beyond strict definite-clause forward chaining (see note below).

Iteration 2: From \(\text{Creaking}(L_{2,1})\) (fact 7) and the \(\Rightarrow\) direction of the biconditional: some adjacent square to \(L_{2,1}\) has damaged floor. The candidates are \(L_{1,1}\), \(L_{3,1}\), and \(L_{2,2}\). Since \(\neg\text{Damaged}(L_{1,1})\) (fact 10), we conclude: \[\text{Damaged}(L_{3,1}) \lor \text{Damaged}(L_{2,2})\]

Forward chaining continues until no new facts are derivable.

Comment 3.8  Forward Chaining Completeness

Forward chaining is complete for definite clauses (rules with exactly one positive conclusion). For general FOL, it may not terminate or miss some conclusions.

3.5.4.3 Characteristics of Forward Chaining

Advantages:

  • Natural for updating beliefs as new percepts arrive
  • Derives all consequences systematically
  • Good when many facts are known and we want all conclusions

Disadvantages:

  • May derive irrelevant facts (unfocused)
  • Can be inefficient if we only need one specific conclusion

3.5.5 Backward Chaining

Definition 3.36  Backward Chaining

Backward chaining (goal-driven inference) starts from the goal and works backward, finding rules whose conclusions match the goal and recursively proving their premises.

3.5.5.1 Algorithm Sketch

function BACKWARD-CHAIN(KB, goal, θ):
    if goal matches a known fact with substitution θ':
        return compose(θ, θ')
    for each rule (body => head) in KB:
        if UNIFY(head, goal) = θ':
            combined = compose(θ, θ')
            for each conjunct g in body:
                result = BACKWARD-CHAIN(KB, g with combined, combined)
                if result = failure:
                    break  # try next rule
                combined = result
            if all conjuncts succeeded:
                return combined
    return failure

3.5.5.2 Backward Chaining Example

Goal: Is \(L_{2,1}\) safe?

Query: \(\text{Safe}(L_{2,1})\)

Rules and facts:

  • \(\forall l\ \text{Safe}(l) \Leftrightarrow \neg\text{Damaged}(l) \land \neg\text{Forklift}(l)\)
  • \(\forall l\ \text{Rumbling}(l) \Leftrightarrow \exists l'\ \text{Adjacent}(l, l') \land \text{Forklift}(l')\)
  • \(\neg\text{Creaking}(L_{1,1})\), \(\neg\text{Rumbling}(L_{1,1})\)
  • \(\text{Adjacent}(L_{1,1}, L_{2,1})\)

Backward chaining:

  1. Goal: \(\text{Safe}(L_{2,1})\)

  2. Find rule with matching head: \(\text{Safe}(l)\) unifies with \(\theta = \{l/L_{2,1}\}\)

  3. Subgoals: \(\neg\text{Damaged}(L_{2,1})\) and \(\neg\text{Forklift}(L_{2,1})\)

  4. Subgoal 1: \(\neg\text{Damaged}(L_{2,1})\)

    • From \(\neg\text{Creaking}(L_{1,1})\) and the creaking rule (contrapositive): no adjacent square has damaged floor
    • \(\text{Adjacent}(L_{1,1}, L_{2,1})\), so \(\neg\text{Damaged}(L_{2,1})\): success
  5. Subgoal 2: \(\neg\text{Forklift}(L_{2,1})\)

    • From \(\neg\text{Rumbling}(L_{1,1})\) and the rumbling rule (contrapositive): no adjacent square has the forklift
    • \(\text{Adjacent}(L_{1,1}, L_{2,1})\), so \(\neg\text{Forklift}(L_{2,1})\): success
  6. Both subgoals satisfied: \(\text{Safe}(L_{2,1})\) is proved

3.5.5.3 Characteristics of Backward Chaining

Advantages:

  • Goal-directed: only explores relevant rules
  • Efficient when the goal is specific
  • Natural for question-answering ("Is X true?")

Disadvantages:

  • Can get stuck in loops without occurs check
  • May redo work if same subgoal appears multiple times

3.5.5.4 Entailment Checking with Z3

The programming language Prolog is the classic embodiment of backward chaining: a Prolog program is a set of facts and rules, and queries are answered by backward chaining through them. In this chapter, we use Z3 instead—an SMT solver that checks entailment via refutation (see Section 3.6: Building a Knowledge-Based Agent for a full introduction). Z3 handles both forward and backward reasoning strategies internally, freeing us from choosing one.

Here is how Z3 answers the same "Is \(L_{2,1}\) safe?" query. Given a solver loaded with the quantified physics rules, adjacency facts, and percepts (\(\neg\text{Creaking}(L_{1,1})\), \(\neg\text{Rumbling}(L_{1,1})\)):

from z3 import Not, unsat

def z3_entails(solver, query):
    """Does the KB entail query? (Proof by refutation.)"""
    solver.push()
    solver.add(Not(query))
    result = solver.check() == unsat
    solver.pop()
    return result

# ASK: Is L_{2,1} safe?
z3_entails(solver, Safe_fn(loc[(2, 1)]))  # True

If check() returns unsat, no interpretation can satisfy the KB while making \(\text{Safe}(L_{2,1})\) false—so the KB entails it. This is proof by refutation, equivalent in power to resolution (Section 3.3.18: Resolution and Completeness). The full Z3-based agents in Section 3.6: Building a Knowledge-Based Agent and Section 3.7: Building a FOL Agent with Z3 use this pattern to classify every square as safe, dangerous, or unknown before the robot moves.

3.5.6 Comparing Forward and Backward Chaining

Comparison of forward and backward chaining
Aspect Forward Chaining Backward Chaining
Direction Facts → Conclusions Goal → Subgoals → Facts
Triggered by New data arriving Specific queries
Focus Derives everything derivable Only relevant to goal
Efficiency Good for many conclusions Good for specific queries
Use case Monitoring, alerting Question answering, planning

In the Hazardous Warehouse:

  • Forward chaining: As the robot moves and perceives creaking/rumbling, update the KB with new facts and derive all consequences (which squares are safe/dangerous)
  • Backward chaining: When deciding whether to enter a square, query "Is \(L_{x,y}\) safe?" and prove it

A practical agent might use both: forward chain to update beliefs after each percept, then backward chain to answer specific planning queries.

3.5.7 Resolution in First-Order Logic

For completeness, we mention that resolution generalizes to FOL.

Definition 3.37  First-Order Resolution

First-order resolution unifies complementary literals from two clauses and derives the resolvent. Combined with skolemization (eliminating \(\exists\) by introducing constants) and CNF conversion, it provides a complete refutation procedure for FOL.

Example:

Clauses:

  1. \(\neg\text{Adjacent}(x, y) \lor \neg\text{Damaged}(y) \lor \text{Creaking}(x)\)
  2. \(\text{Adjacent}(L_{2,1}, L_{3,1})\)
  3. \(\text{Damaged}(L_{3,1})\)

Resolve 1 and 2 with \(\{x/L_{2,1}, y/L_{3,1}\}\): \[\neg\text{Damaged}(L_{3,1}) \lor \text{Creaking}(L_{2,1})\]

Resolve with clause 3: \[\text{Creaking}(L_{2,1})\]

Comment 3.9  Resolution in Practice

Full first-order resolution is powerful but can be slow due to the large search space. Modern theorem provers use more sophisticated strategies. For many applications, forward/backward chaining on restricted rule forms is more practical.

3.5.8 The Knowledge-Based Agent Loop

Putting it together, a knowledge-based agent for the Hazardous Warehouse operates as:

  1. Initialize KB with background knowledge (adjacency, percept rules, safety definitions)

  2. Perceive: Receive percepts (creaking, rumbling, beacon, bump, beep)

  3. TELL: Add percept facts to KB

    • \(\text{Creaking}(\mathit{CurrentLoc})\) or \(\neg\text{Creaking}(\mathit{CurrentLoc})\)
    • Similarly for rumbling, beacon, bump, beep
  4. Forward chain: Derive new conclusions about safe/dangerous squares

  5. ASK (backward chain): Query for safe moves

    • "Is \(L_{x,y}\) safe?" for each adjacent square
    • Select a safe unexplored square, or if none, backtrack
  6. Act: Move, grab package, use shutdown device, or exit

  7. TELL: Record action taken (e.g., \(\text{Visited}(\mathit{CurrentLoc})\))

  8. Repeat until goal achieved or failure

This loop integrates logical reasoning with action selection, enabling the robot to navigate safely through partial information.

3.5.9 Computational Considerations

3.5.9.1 Decidability

Theorem 3.2  FOL Semi-Decidability

First-order entailment is semi-decidable: if \(KB \models \alpha\), a complete procedure will eventually prove it. But if \(KB \not\models \alpha\), the procedure may run forever.

This contrasts with propositional logic, which is decidable (though NP-complete for satisfiability).

3.5.9.2 Practical Restrictions

For tractable inference, we often restrict to:

  • Definite clauses: Rules with exactly one positive literal in the head (Horn clauses with exactly one positive)
  • Datalog: No function symbols, finite domains
  • Description logics: Restricted quantification patterns (used in ontologies)

The Hazardous Warehouse can be encoded in Datalog (finite locations, no functions), making inference decidable and efficient.

3.5.10 Summary

Inference in first-order logic extends propositional methods:

  • Unification matches expressions with variables, enabling general rules to apply to specific facts
  • Generalized Modus Ponens derives conclusions from rules and facts using unification
  • Forward chaining derives all consequences from facts—good for updating beliefs
  • Backward chaining proves specific goals by working backward—good for queries
  • Resolution provides completeness but is computationally expensive

For the Hazardous Warehouse agent:

  • Forward chain when percepts arrive to update beliefs about safe/dangerous squares
  • Backward chain to answer "Is this square safe?" before moving
  • The combination enables safe navigation with incomplete information

These techniques form the foundation of rule-based expert systems, Prolog programming, and knowledge representation systems used throughout AI and software engineering.

Bibliography

  1. [AI] Russell, Stuart J. and Peter Norvig. Artificial intelligence: a modern approach. (2020) 4 ed. Prentice Hall. http://aima.cs.berkeley.edu/