Skip to content

Meeting: 2025 07 08

Rich Hornung edited this page Jul 10, 2025 · 1 revision
  • What would be most useful for applications to get from vendors w.r.t. RAJAPerf?

  • Use RAJAPerf as a performance benchmark AND also for compiler testing/verification?

    • For performance, we should allow them to modify RAJA and RAJA Perf with the following rules:
      • No RAJA API changes
      • Must do RAJA variants (can do base too) and cannot remove RAJA from code
      • Must use the same RAJA version for each kernel (tied to the RAJA Perf version via submodule)
    • For compiler testing
      • Ensure compiler supports all C++ language features used in RAJA and RAJA Perf
      • Ensure performance of RAJA and base variants of each kernel are within some bounds
  • How to prioritize which kernels to include in benchmarking exercises?

    • Cover gaps as much as possible in our small set of proxy-apps
    • Add a kernel to represent a MARBL case that we don't cover currently (slight modification of MASS3DEA)?
    • Stress shared memory usage by increasing order of one of the MARBL-based kernels
    • We have a shared memory version of LTIMES in RAJA examples. No shared memory is used in Kripke?
    • Add high-dimensional tensor contraction (Arturo is working on this)
    • Vendor-supplied BLAS1/2 (sparse) batched versions -- should this be captured in the contract or represented in a benchmark?
  • What metrics do we want to require (FOM)?

  • Throughput plots

  • Questions for Olga/benchmarking team

    • When do we have to freeze/release the code?
    • When do we have to have benchmark data?
    • How many kernels can we have in Tier 1? Tier 2?
  • Tier 1 Kernels

    • FEMSWEEP (Apps) -- lots of things need to be done to make it Tier 1 ready...
    • Others if we can have more than one?
  • Tier 2 Kernels

    • REDUCE_STRUCT (multiple reductions is important case to cover -- atomics)

Clone this wiki locally