Team-based matrix-free unsteady advection-diffusion with SUPG stabilization, with the fused-arithmetic optimizations transferred from EpsilonDivDivKerngen and DivergenceKerngen/GradientKerngen. More...

#include "../../quadrature/quadrature.hpp"
#include "communication/shell/communication.hpp"
#include "communication/shell/communication_plan.hpp"
#include "fe/wedge/operators/shell/unsteady_advection_diffusion_supg.hpp"
#include "dense/vec.hpp"
#include "fe/wedge/integrands.hpp"
#include "fe/wedge/kernel_helpers.hpp"
#include "grid/shell/spherical_shell.hpp"
#include "linalg/operator.hpp"
#include "linalg/vector.hpp"
#include "linalg/vector_q1.hpp"
#include "util/timer.hpp"

Classes
class	terra::fe::wedge::operators::shell::UnsteadyAdvectionDiffusionSUPGKerngen< ScalarT, VelocityVecDim >

Namespaces
namespace	terra

namespace	terra::fe

namespace	terra::fe::wedge
	Features for wedge elements.

namespace	terra::fe::wedge::operators

namespace	terra::fe::wedge::operators::shell

Detailed Description

Team-based matrix-free unsteady advection-diffusion with SUPG stabilization, with the fused-arithmetic optimizations transferred from EpsilonDivDivKerngen and DivergenceKerngen/GradientKerngen.

Math identical to UnsteadyAdvectionDiffusionSUPG, but the 6×6 local mass matrix M and operator matrix A (advection + diffusion + streamline-diffusion) are never materialised. The standard bilinear form

dst_i = Σⱼ (M_ij + dt·A_ij) · src_j

with M_ij = Σ_q w_q |det J_q| · m · φ_i(q) · φ_j(q) A_ij = Σ_q w_q |det J_q| · [ κ · ∇φ_i(q)·∇φ_j(q)

φ_i(q) · u(q)·∇φ_j(q)
τ · (u(q)·∇φ_i(q)) · (u(q)·∇φ_j(q)) ]

collapses to dst_i = Σ_q w_q |det J_q| · { φ_i(q) · A_scalar(q)

∇φ_i(q) · B_vec(q) } where, for each quadrature point, T̂(q) = Σⱼ φ_j(q) · T_j ∇T(q) = Σⱼ ∇φ_j(q) · T_j u(q) = Σⱼ φ_j(q) · u_j A_scalar(q) = m · T̂(q) + dt · (u(q)·∇T(q)) B_vec(q) = dt·κ·∇T(q) + dt·τ·(u(q)·∇T(q)) · u(q)

τ is kept exact: volume-averaged over all 6 quadrature points (matching the legacy code) via a pre-pass that reuses the u(q) values.

Dirichlet boundary handling:

column elimination — when accumulating T̂(q) and ∇T(q), skip boundary-node T_j contributions.
row elimination — for boundary node i, only the diagonal term survives. A separate inline diagonal compute handles this.

Lumped mass: mass term replaced by row-sum diagonal M_ii T_i, where M_ii = m · Σ_q w_q |det J_q| · φ_i(q) (because Σⱼ φ_j ≡ 1 for a partition-of-unity basis).

Diagonal mode: full diagonal-only of (M + dt·A).

Transferred structural optimisations:

Kokkos::TeamPolicy with backend-aware tiling (4,4,8) × r_passes=2 on CUDA.
Host-side KernelPath dispatch (Slow / Fast) + template<bool LumpedMass, bool Diagonal, bool TreatBoundary> so the compiler dead-eliminates unused branches.
LaunchBounds<128, 5>.
Shared-memory staging: coords, radii, T, velocity.
ShellBoundaryCommPlan for halo exchange.

Classes

Namespaces

Detailed Description