Loading...
Searching...
No Matches
unsteady_advection_diffusion_supg_kerngen.hpp File Reference

Team-based matrix-free unsteady advection-diffusion with SUPG stabilization, with the fused-arithmetic optimizations transferred from EpsilonDivDivKerngen and DivergenceKerngen/GradientKerngen. More...

Go to the source code of this file.

Classes

class  terra::fe::wedge::operators::shell::UnsteadyAdvectionDiffusionSUPGKerngen< ScalarT, VelocityVecDim >
 

Namespaces

namespace  terra
 
namespace  terra::fe
 
namespace  terra::fe::wedge
 Features for wedge elements.
 
namespace  terra::fe::wedge::operators
 
namespace  terra::fe::wedge::operators::shell
 

Detailed Description

Team-based matrix-free unsteady advection-diffusion with SUPG stabilization, with the fused-arithmetic optimizations transferred from EpsilonDivDivKerngen and DivergenceKerngen/GradientKerngen.

Math identical to UnsteadyAdvectionDiffusionSUPG, but the 6×6 local mass matrix M and operator matrix A (advection + diffusion + streamline-diffusion) are never materialised. The standard bilinear form

dst_i = Σⱼ (M_ij + dt·A_ij) · src_j

with M_ij = Σ_q w_q |det J_q| · m · φ_i(q) · φ_j(q) A_ij = Σ_q w_q |det J_q| · [ κ · ∇φ_i(q)·∇φ_j(q)

  • φ_i(q) · u(q)·∇φ_j(q)
  • τ · (u(q)·∇φ_i(q)) · (u(q)·∇φ_j(q)) ]

collapses to dst_i = Σ_q w_q |det J_q| · { φ_i(q) · A_scalar(q)

  • ∇φ_i(q) · B_vec(q) } where, for each quadrature point, T̂(q) = Σⱼ φ_j(q) · T_j ∇T(q) = Σⱼ ∇φ_j(q) · T_j u(q) = Σⱼ φ_j(q) · u_j A_scalar(q) = m · T̂(q) + dt · (u(q)·∇T(q)) B_vec(q) = dt·κ·∇T(q) + dt·τ·(u(q)·∇T(q)) · u(q)

τ is kept exact: volume-averaged over all 6 quadrature points (matching the legacy code) via a pre-pass that reuses the u(q) values.

Dirichlet boundary handling:

  • column elimination — when accumulating T̂(q) and ∇T(q), skip boundary-node T_j contributions.
  • row elimination — for boundary node i, only the diagonal term survives. A separate inline diagonal compute handles this.

Lumped mass: mass term replaced by row-sum diagonal M_ii T_i, where M_ii = m · Σ_q w_q |det J_q| · φ_i(q) (because Σⱼ φ_j ≡ 1 for a partition-of-unity basis).

Diagonal mode: full diagonal-only of (M + dt·A).

Transferred structural optimisations:

  • Kokkos::TeamPolicy with backend-aware tiling (4,4,8) × r_passes=2 on CUDA.
  • Host-side KernelPath dispatch (Slow / Fast) + template<bool LumpedMass, bool Diagonal, bool TreatBoundary> so the compiler dead-eliminates unused branches.
  • LaunchBounds<128, 5>.
  • Shared-memory staging: coords, radii, T, velocity.
  • ShellBoundaryCommPlan for halo exchange.