linearization everywhere
Batch optimization methods are ubiquitous in robotics. We would like to solve for some robot states given some measurements. To do this, we create an error that quantifies the difference between the measurement we expect (that depends on the state) and the the measurement we receive (a given value from a sensor, and does not depend on the robot state).
Consider a single received measurement $\mathbf{y}$, the sensor model for which is $\mathbf{y}=\mathbf{g}(\mathbf{x})+\mathbf{v}$, where $\mathbf{v}\sim \mathcal{N}(\mathbf{0}, \mathbf{R})$ is Gaussian distributed noise with covariance $\mathbf{R}$. The robot state estimate is obtained by solving the Max A Posteriori problem \begin{equation} \hat{\mathbf{x}}=\text{argmax}_{\mathbf{x}} p(\mathbf{x}|\mathbf{y}), \end{equation}
Since there are no priors, and discarding the normalization constant, Bayes’ rule leaves us with \begin{equation} \hat{\mathbf{x}}=\text{argmax}_{\mathbf{x}} p(\mathbf{y}|\mathbf{x}). \end{equation}
which is equivalent to minimizing the negative log-likelihood as \begin{equation} \hat{\mathbf{x}}=\text{argmin}_{\mathbf{x}} -\log p(\mathbf{y}|\mathbf{x}). \end{equation}
The form for $p(\mathbf{y}|\mathbf{x})$ is Gaussian, meaning that \begin{equation} p(\mathbf{y}|\mathbf{x}) = \alpha \exp(-(\mathbf{y}-\mathbf{g}(\mathbf{x}))^{\text{trans}}\mathbf{R}^{-1}(\mathbf{y}-\mathbf{g}(\mathbf{x}))), \end{equation}
where $\alpha$ is a normalization constant. By defining $\mathbf{e}(\mathbf{x})=\mathbf{y}-\mathbf{g}(\mathbf{x})$, the negative log-likelihood minimization can be written \begin{equation} \hat{\mathbf{x}}=\text{argmin}_{\mathbf{x}} \mathbf{e}(\mathbf{x}) \mathbf{R}^{-1} \mathbf{e}(\mathbf{x}). \end{equation}
This is pretty much in the nonlinear-least-squares form that is used by solvers such as Ceres. We linearize the error $\mathbf{e}(\mathbf{x})$ to construct successive Taylor series approximations to our loss and (hopefully) reach the minimum we want.
However, what happens when the noise enters nonlinearly into the measurement model? Formally, \begin{equation} \mathbf{y}=\mathbf{g}(\mathbf{x}, \mathbf{v}) \quad \mathbf{v}\sim \mathcal{N}(\mathbf{0}, \mathbf{R}). \end{equation}
The measurement likelihood $p(\mathbf{y}|\mathbf{x})$ is non-Gaussian and forming the error \begin{equation} \mathbf{e}(\mathbf{x})=\mathbf{y}-\mathbf{g}(\mathbf{x}, \mathbf{v}), \end{equation}
gets us nowhere. So we linearize and write \begin{equation} \mathbf{y}=\mathbf{g}(\mathbf{x}) + \mathbf{v}\quad \mathbf{v}\sim \mathcal{N}(\mathbf{0}, \mathbf{L} \mathbf{R} \mathbf{L}^{\text{T}}), \end{equation}
where $\mathbf{L}$ is the Jacobian of the measurement model with respect to the noise variable evaluated at the current state estimate $\bar{\mathbf{x}}$, \begin{equation} \mathbf{L} = \left. \frac{\partial \mathbf{g}}{\partial \mathbf{x}}\right|_{\bar{\mathbf{x}}}. \end{equation}
This is different from the linearization approximation that we use in methods like Gauss-Newton, where we first pose the problem then use iterative methods to solve it. _before even going into the optimizer_ we make a linearization approximation to the loss. Whether this causes issues depends on the application. We can get funny situations where $\mathbf{L}$ is not full rank, causing a singular covariance.