QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction

1 ETIS (UMR 8051), CY Cergy Paris University, ENSEA, CNRS, France
2 AGM (UMR 8088), CY Cergy Paris University, CNRS, France
3 University of Ljubljana, Slovenia
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

*Indicates Equal Contribution
teaser

Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, while reducing the number of unrolling iterations required.

Abstract

Inverse problems span across diverse fields. In medical contexts, computed tomography (CT) plays a crucial role in reconstructing a patient's internal structure, presenting challenges due to artifacts caused by inherently ill-posed inverse problems. Previous research advanced image quality via post-processing and deep unrolling algorithms but faces challenges, such as extended convergence times with ultra-sparse data. Despite enhancements, resulting images often show significant artifacts, limiting their effectiveness for real-world diagnostic applications. We aim to explore deep second-order unrolling algorithms for solving imaging inverse problems, emphasizing their faster convergence and lower time complexity compared to common first-order methods like gradient descent. In this paper, we introduce QN-Mixer, an algorithm based on the quasi-Newton approach. We use learned parameters through the BFGS algorithm and introduce Incept-Mixer, an efficient neural architecture that serves as a non-local regularization term, capturing long-range dependencies within images. To address the computational demands typically associated with quasi-Newton algorithms that require full Hessian matrix computations, we present a memory-efficient alternative. Our approach intelligently downsamples gradient information, significantly reducing computational requirements while maintaining performance. The approach is validated through experiments on the sparse-view CT problem, involving various datasets and scanning protocols, and is compared with post-processing and deep unrolling state-of-the-art approaches. Our method outperforms existing approaches and achieves state-of-the-art performance in terms of SSIM and PSNR, all while reducing the number of unrolling iterations required.

Sparse-View Reconstruction Challenges

problem

Computed tomography (CT) is a widely used imaging modality in medical diagnosis and treatment planning, delivering intricate anatomical details of the human body with precision. Despite its success, CT is associated with high radiation doses, which can increase the risk of cancer induction. Adhering to the ALARA principle (As Low As Reasonably Achievable), the medical community emphasizes minimizing radiation exposure to the lowest level necessary for accurate diagnosis. Numerous approaches have been proposed to reduce radiation doses while maintaining image quality. Among these, sparse-view CT emerges as a promising solution, effectively lowering radiation doses by subsampling the projection data, often referred to as the sinogram. Nonetheless, reconstructed images using the well-known Filtered Back Projection (FBP) algorithm suffer from pronounced streaking artifacts, which can lead to misdiagnosis. The challenge of effectively reconstructing high-quality CT images from sparse-view data is gaining increasing attention in both the computer vision and medical imaging communities.

Proposed method

Proposed method

Reconstructed images using the well-known Filtered Back Projection (FBP) algorithm suffer from pronounced streaking artifacts.

  1. Initial deep learning techniques applied FBP reconstructed images as a post-processing task show promise in artifact removal and structure preservation but face limitations due to constrained receptive fields, leading to suboptimal results. However, these methods are computationally efficient.
  2. Deep unrolling algorithms, such as the RegFormer algorithm, have been introduced as an iterative reconstruction methods. However, they face issues such as slow convergence and high computational costs. Consequently, there is a need to explore more efficient alternatives due to the difficulties in capturing long-range dependencies and the growing computational demands of modern neural networks.
  3. The paper introduces a second-order unrolling network called QN-Mixer, which employs a latent BFGS algorithm to approximate the inverse Hessian matrix with a deep-net learned regularization term. It outperforms state-of-the-art methods in terms of quantitative metrics while requiring fewer iterations than first-order unrolling networks.

Methodology

overview

Our method is a new type of unrolling networks, drawing inspiration from the quasi-Newton method. The figure above illustrates the QN-Mixer architecture. It approximates the inverse Hessian matrix using a latent BFGS algorithm and incorporates a non-local regularization term, Incept-Mixer, aimed at capturing non-local relationships. To address the computational challenges associated with full inverse Hessian matrix approximation, a latent BFGS algorithm is utilized.

incept-mixer

Incept-Mixer bloc is crafted by drawing inspiration from both the multi-layer perceptron mixer and the inception architecture, leveraging the strengths of each: capturing long-range interactions through the attention-like mechanism of MLP-Mixer and extracting local invariant features from the inception block.

Visual Comparison

BibTeX


        @inproceedings{ayad2024,
          title={QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction}, 
          author={Ayad, Ishak and Larue, Nicolas and Nguyen, Maï K.},
          booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
          year={2024},
        } 
    

Acknowledgements

This work was granted access to the HPC resources of IDRIS under the allocation 2021-[AD011012741] / 2022-[AD011013915] provided by GENCI and supported by DIM Math Innov funding.