Changes between Version 4 and Version 5 of Public/User_Guide/SDV_KNLs


Ignore:
Timestamp:
Jan 13, 2017, 11:30:14 AM (8 years ago)
Author:
Anke Kreuzer
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • TabularUnified Public/User_Guide/SDV_KNLs

    v4 v5  
    1818= Compiling =
    1919
    20 Use the -xMIC-AVX512 flag instead of -mmic.
     20Use the -xMIC-AVX512 flag instead of -mmic.\\
     21Check actual vectorisation with -qopt-report=5 -qopt-report-phase=vec -> info given in *.optrpt files
     22
    2123
    2224
    2325= Using MPI over EXTOLL =
    2426set PSP_RENDEZVOUS_VELO = <max. message size>
     27\\
     28\\
     29
     30= 5 things to consider when using KNL =
     311. Make sure to use the fast MCDRAM:
     32  * When MCDRAM is in cache mode:
     33    * No changes are needed.
     34  * When MCDRAM is in flat mode:
     35    * If the total memory footprint of the application is smaller than the size of MCDRAM: numactl –m 1 ./my_application.out (Allocations that don’t fit into MCDRAM make the application fail.)
     36    * If the total memory footprint of the application is larger than the size of MCDRAM: numactl –p 1 ./my_application.out ( Allocations that don’t fit into MCDRAM spill over to DDR)
     37    * To make a manual choice of what should be allocated in the MCDRAM: Use the memkind library.\\
     38
     392. Verify that the pinning is as you wish:
     40  * Start job on KNL node(s).
     41  * Log in on KNL.
     42  * Invoke htop.
     43  * Check the load distribution.
     44  * Remark: Each core can execute 1, 2 or 4 threads. On KNL – unlike on KNC – already one thread per core can lead to optimal performance.\\
     45
     463. Use VTune/Advisor to analyse the performance:
     47  * Start job on KNL node(s).
     48  * Log in on KNL.
     49  * 'module load VTune / Advisor'.
     50  * Run amplxe-gui / advixe-gui.
     51  * Follow instructions.
     52  * Remark: If you run into erros of the sort “sepdk not available” please contact the administrator. Both tools rely on a kernel module to access hardware counter.\\
     53
     544. Provide hints to the compiler:
     55  * Check *optrpt for info on vectorisation.
     56  * If you find “unaligned...” -> add alignment in your code by adding "#pragma vector aligned" before the loop.
     57  * If a loop does not vectorise although it clearly should, you can add "#pragma simd" before the loop.
     58  * Re-check *.optrpt.
     59  * Re-check in VTune / Advisor\\
     60
     615. Verify the performance via benchmarks:
     62  * Set up JUBE for your code.
     63  * Benchmark the various versions with proper timing.
     64  * Be aware: VTune / Advisor sometimes give estimates that are a little off. It's imperative to check the actual performance.
     65