Cognitive Reasoning Agents and the Extended Information Filter
Teaching AI Agents to Guess Smarter, Not Harder
Probabilistic reasoning plays a pivotal role in advancing intelligent systems by enabling cognitive agents to operate effectively in dynamic and uncertain contexts.
In other words. The real world is analog and continuous. Sensors produce noise. Noise is a problem because it obfuscates the signal. As a GPS signal jumps across the map just so slightly, the randomness of noise creeps into sensor measurements creating the need for probabilistic over deterministic methods to allow for more robust decision-making and specifically adaptability. A Gaussian filter is an example of such a method that cognitive agents can use to generalize their reasoning across diverse tasks and environments, reducing the need for task-specific customization. This approach not only accelerates learning but also enhances an agent’s ability to generalize its capabilities, making it suitable for a wide range of applications, from decision support to adaptive learning systems.
But before we dive in, let’s start with some definitions.
Key Terms and Definitions
Gaussian Distribution: A well-established type of probability distribution that describes how likely values are to occur around a central average (mean) and how spread out those values are (variance or covariance). Moments are quantities that describe our distribution’s shape: the mean (μ) and covariance (Σ)
Moments-Based Representation: However, like forces, moments can be represented in multi-dimensional space as vectors with a magnitude, a direction, and a "point of application". In this way, it can also be used to represent a Gaussian Distribution. Intuitively, the mean tells us the center of our probability mass, while the covariance tells us how uncertain we are and in what directions.
Information Matrix: The inverse of the covariance matrix, often making it easier to work with large systems because it can be sparse (mostly zeros). In the information matrix, each element represents the precision (inverse of variance) and relationships between variables. Larger values indicate higher certainty or stronger relationships.
Information Vector: A mathematical transformation of the mean that pairs with the information matrix in the canonical representation.
Canonical Parameterization: Another way to represent a Gaussian, but using the aforementioned information matrix and an information vector (related to the mean). This can make certain calculations faster and easier since we are already in vector space.
Marginalization: The process of simplifying a model by removing variables we’re not interested in, to focus only on the relevant ones.
Kalman Filter (KF): A recursive estimation algorithm that uses Gaussian distributions to track the state of a system over time. It handles both prediction steps (using a motion model) and measurement updates efficiently for linear systems, working with means and covariances.
source
Information Filter (IF): The canonical form equivalent of the Kalman filter, representing uncertainty using information matrices and vectors instead of means and covariances. It excels at measurement updates (which become simple additions) but requires more computation for prediction steps.
Extended Kalman Filter (EKF): An adaptation of the Kalman filter for non-linear systems. It handles non-linear motion and measurement models by linearizing them around the current state estimate using Taylor series expansion while maintaining the mean and covariance representation.
Extended Information Filter (EIF): The non-linear version of the Information Filter, combining the measurement update benefits of the information representation with linearization techniques for handling non-linear systems. Like the EKF, it linearizes around the current estimate but operates in the information space.
Taylor Approximation (also called Taylor Expansion): A mathematical method that approximates a non-linear function near a specific point using a series of increasingly complex polynomial terms. In the context of filtering, we typically use just the first-order terms, creating a linear approximation of the function based on its value and slope at that point.
Jacobian Matrix: A matrix containing all first-order partial derivatives of a vector-valued function. In filtering, Jacobians are used to linearize non-linear motion and measurement models at the current state estimate.
Gaussians in Probabilistic Reasoning
Gaussian distributions are central to probabilistic reasoning, representing uncertainty in evidence and inferences. In 1D, the mean represents the central value, while the variance indicates the spread. For multi-dimensional systems, such as those encountered in cognitive reasoning, the mean becomes a vector, and the covariance becomes a matrix, capturing relationships between different variables. These properties form the foundation for state estimation and inference algorithms.
However, we can also represent the same distribution through canonical parameterization. Here we can use the information matrix (inverse of the covariance matrix) and the information vector (a transformation of the mean). In my experience, when updating probabilistic reasoning, this representation often simplifies computations, because it avoids the direct manipulation of covariance matrices.
Canonical parameterizations are more computationally efficient for certain operations making it preferable under considerations of scaling constraints. When reformulating Gaussian distributions in this way, it can help to simplify the updating of distributions with new evidence (marginalization). This could make it a good candidate for dynamic learning where we frequently have to update the state estimate.
From Kalman Filter to Information Filter
A Gaussian filter helps AI agents handle uncertainty in reasoning by predicting likely outcomes based on prior knowledge and updating those predictions as new information is received. For example, an agent estimating a user's intent in a conversation can use a Gaussian filter to combine past context with new inputs, refining its understanding. Marginalization allows the agent to focus only on relevant variables, like current user intent, while discarding irrelevant past states, ensuring efficient and consistent reasoning. However, while a Kalman filter is a specific type of Gaussian filter it is designed for linear systems with Gaussian noise. And that’s the problem. Cognitive reasoning involves complex relationships between variables, such as context, prior knowledge, and new information. These relationships are rarely linear because the impact of new evidence or changes in context often depends on multiple, interdependent factors. Because of that the Extended Kalman Filter is usually the better approach.
In general, a Kalman filter is a two-step process:
Prediction: Propagates the state estimate and uncertainty forward using the agent's inference model and process noise.
Correction: Updates the state estimate with new evidence, accounting for uncertainty in observations.
An Information Filter reformulates these steps using the information matrix and vector. While the correction step is simplified in the information filter, the prediction step involves computationally intensive transformations. In other words, the information filter uses an efficient correction due to sparse updates but has a costly prediction step, as it requires transforming the information matrix to covariance
In large-scale reasoning systems, the information filter often proves more efficient, particularly when observations are frequent.
Extended Information Filter
The EIF extends the information filter to handle non-linear systems, similar to how the Extended Kalman Filter (EKF) generalizes the Kalman filter. The EIF achieves this by linearizing non-linear reasoning and observation models using Taylor series approximations. This process involves calculating Jacobians (matrices of partial derivatives) to approximate the system’s behavior around the current state estimate. By operating in the information space, the EIF retains computational advantages, particularly in handling large, sparse systems.
When it comes to prediction and correction, the EIF modifies the EKF’s prediction and correction steps to operate in the information form:
Prediction: Substitutes the EKF’s moments-based prediction equations with their canonical equivalents.
Correction: Updates the information matrix and vector using linearized reasoning models. This ensures the EIF maintains consistency with the underlying non-linear models.
Hence, the EIF is essentially the EKF reformulated in the information space. While both are mathematically equivalent, the EIF is often reported to be more numerically stable, particularly for systems with sparse information matrices. That said, the EKF remains more popular due to its simpler implementation and widespread support. The choice between these methods depends on the specific requirements of the application, including computational resources and the nature of the system dynamics.
In closing
When building reasoning agents that guide real-world applications like autonomous cars, unmanned aerial vehicles, or robots, being able to correctly and reliably update states given noisy sensor data is really important.
My takeaway here is that
Gaussian distributions can be represented using moments or canonical parameterizations, each offering unique computational advantages.
The Kalman and Information filters address different computational trade-offs, with the choice depending on the application’s prediction and update needs.
The EIF extends the Information filter to non-linear systems, providing a robust alternative to the EKF — thus being ultimately effective for reasoning applications.
If you want to learn more about Kalman/Bayesian filtering, I think this Repository is really nice.
However, the suitability of a filtering approach depends on your framework’s characteristics and computational constraints, particularly in complex tasks like cognitive reasoning and adaptive learning.