The following list identifies theoretical, methodological, and technological aspects of the CDF algorithm that require further investigation and development. These topics represent opportunities for future research and are organized by priority and relevance to algorithm improvement.

Research Agenda for CDF Development

  1. A priori constraint strength for fused products

    Develop a principled method to select the optimal a priori profile and variance-covariance matrix for the fused product, ensuring that the constraint strength is neither too weak (leading to poorly constrained solutions) nor too strong (suppressing real atmospheric variability).

  2. A priori constraint strength for generic OE products

    Extend the understanding of a priori constraint selection to individual optimal estimation products, independent of the fusion context, to improve general product quality.

  3. Reformulation of fusion with total columns and profiles

    Develop explicit CDF(2015) and CDF(2022) formulations for scenarios where total-column measurements are fused with vertical profiles, accounting for the mathematical differences between scalar and vector-valued quantities.

  4. Quantification of coincidence errors

    Establish rigorous, data-driven methods to quantify errors arising from imperfect spatial and temporal coincidence of measurements. Current approaches use fixed adjustments based on a priori information; more sophisticated methods that account for actual atmospheric variability patterns are needed.

  5. Interpolation error strategies

    Systematically test and compare different mathematical strategies for quantifying and including vertical interpolation errors in the fusion covariance matrices. Evaluate trade-offs between simplicity and accuracy across different atmospheric conditions and measurement types.

  6. Extended vertical interpolation algorithm

    Thoroughly test and validate the extended vertical interpolation algorithm (which allows fusion on vertical grids that exceed the intersection of individual retrieval grids) before deploying it in operational settings. Evaluate performance with real atmospheric data.

  7. Numerical error analysis

    Conduct detailed studies of truncation and round-off errors in the CDF equations, especially for state vectors with large dynamic range. Develop guidelines for detecting and mitigating numerical instabilities and establish best practices for matrix inversion thresholds.

  8. Auto-consistency test criteria

    Develop precise, mathematically rigorous criteria to determine whether an input product passes the auto-consistency test. Move beyond current heuristic approaches (e.g., “differences must be within total error”) to objective, quantitative standards.

  9. Mono-type fusion with simulated data

    Design and implement systematic mono-type fusion tests based on single measured profiles used as templates to generate sets of realistic simulated products with controlled noise and systematic differences. This enables controlled validation of the fusion algorithm.

  10. Mono-type fusion test criteria

    Establish precise criteria for determining mono-type fusion test success, including quantitative thresholds for acceptable oscillations, error reduction, and DOF improvement. Validate these criteria across diverse datasets and measurement types.

  11. Fusion of state vectors with components on different vertical grids

    Extend CDF to handle products whose state vectors contain different atmospheric species (gases), each defined on its own vertical grid. Current CDF formulations assume all components of the state vector are defined on a common grid. The proposed extension requires developing and implementing separate interpolation matrices for each state vector component (species). This approach, inspired by the multi-target retrieval methodology, must:

    • Define component-specific interpolation matrices \\mathbf{H}^{(k)} for species k, mapping each species grid to the fusion grid
    • Generalize the sampling matrices \\mathbf{C}^{(i,k)} and \\mathbf{C}^{(f,k)} to account for different grids per component
    • Reformulate error covariance modifications to handle species-dependent interpolation errors
    • Ensure consistency of the fused averaging kernel matrices and covariance matrices across species boundaries
    • Validate the extended algorithm with multi-species atmospheric products (e.g., simultaneous temperature, O3, and H2O retrievals)

    This extension is essential for modern atmospheric remote sensing, where multi-species products are increasingly common and often have species-specific vertical resolution limitations.

  12. Consistency of characterisation matrix relationships

    A well-posed optimal estimation product must satisfy three fundamental relationships among its characterisation matrices (see Prerequisites section, Eq. P1–P3). The current CDF formulations [Eq. A.1–A.5 for Config A, and generalizations for other configurations] derive the output averaging kernels and covariance matrices directly from the input quantities without explicitly enforcing these relationships. This is both inefficient and risky (errors can propagate unchecked). Develop an alternative CDF(2022) formulation that:

    • Leaves the fused state vector \mathbf{x}_{f} unchanged
    • Modifies the derivation of \mathbf{S}_{nf} and \mathbf{S}_{sf} to guarantee that the three output matrices satisfy P1–P3
    • Provides similar corrections for CDF(2015)

    This reformulation would increase robustness and eliminate a potential source of internal inconsistencies in the fused product.

Implementation Priorities

These research topics are not equally urgent. The following grouping suggests a prioritization:

Priority Topics
High 8, 9, 10, 11, 12 — Define rigorous test criteria (8, 10) and implement simulation framework (9). Topic 11 (fusion on non-overlapping grids) and 12 (matrix consistency) improve algorithm robustness and practical applicability directly.
Medium 4, 5, 7 — Address coincidence and interpolation errors (4, 5) and understand numerical stability (7). Important for real-world datasets.
Medium-Low 1, 2, 3, 6 — Refine constraint selection (1, 2), extend to total columns (3), validate extended interpolation (6). Important for advanced applications.