Tal Ben-Nun
Computer Scientist
Research staff in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory.Former member of the Scalable Parallel Computing Laboratory at ETH Zurich.
Former member of the Distributed Computing and the X-ray scattering labs at the Hebrew University of Jerusalem.
Former member of the Parallel Systems Lab at the Hebrew University of Jerusalem.
Research
Data movement minimization has become the most important factor in performance optimization. We make performance programming easier by rethinking existing paradigms, via new representations and workflows that expose optimization opportunities and utilize hardware efficiently.
Machine learning on code can be challenging if it is treated as text. We create new intermediate representations of code that can be used for automatic comprehension and performance optimization.
- ProGraML (ICML'21, Code)
- Neural Code Comprehension, inst2vec (NeurIPS'18, Code)
Scaling machine learning to large clusters poses challenges, from I/O to optimizers. We tackle communication, neural network architectures, and reproducibility - from theory to practice.
Using deep neural networks and creating datasets to improve scientific computing applications. Examples include improving weather uncertainty quantification and deforestation prediction.
- Ensemble weather forecasts (MLPS'19, RSTA'21, Code)
- The MAELSTROM project
Software
- DaCe - Data-Centric parallel programming framework for CPUs, GPUs, and FPGAs.
- Deep500 - An HPC Deep Learning benchmark, competition, and meta-framework.
- MAPS - Device-level GPU memory abstraction and code optimization library.
- CUDNN Training - A CUDNN-based minimal deep learning training code sample using LeNet.
- MGBench - Multi-GPU computing benchmark suite.
- ceres-windows - Windows port of the ceres-solver nonlinear optimization library.
- Klogger - A Linux Kernel Logging Framework.
- X+ - Numerical Analysis Tool for Solution and Powder Scattering Structure Factor of Macromolecular Systems.
Selected Publications
(full list at Google Scholar)Productive Performance Engineering for Weather and Climate Modeling with Python
Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), November 2022.
Boosting Performance Optimization with Interactive Data Movement Visualization
Philipp Schaad, Tal Ben-Nun, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), November 2022.
Deinsum: Practically I/O Optimal Multilinear Algebra
Alexandros Nikolaos Ziogas, Grzegorz Kwasniewski, Tal Ben-Nun, Timo Schneider, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), November 2022.
A Data-Centric Optimization Framework for Machine Learning
Oliver Rausch*, Tal Ben-Nun*, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler.
In the ACM International Conference on Supercomputing (ICS), June 2022.
Lifting C semantics for dataflow optimization
Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler.
In the ACM International Conference on Supercomputing (ICS), June 2022.
Productivity, Portability, Performance: Data-Centric Python
Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), November 2021.
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), November 2021.
Clairvoyant Prefetching for Distributed Machine Learning I/O
Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler.
In the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), November 2021.
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra Peste.
In the Journal of Machine Learning Research (JMLR), September 2021.
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs
Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler.
In the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '21), July 2021.
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
Chris Cummins, Zacharias Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O'Boyle, Hugh Leather.
In the Thirty-eighth International Conference on Machine Learning (ICML), July 2021.
NPBench: A Benchmarking Suite for High-Performance NumPy
Alexandros Nikolaos Ziogas, Tal Ben-Nun, Timo Schneider, Torsten Hoefler.
In the ACM International Conference on Supercomputing (ICS), June 2021.
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler.
In the Fourth Conference on Machine Learning and Systems (MLSys), April 2021.
Outstanding paper award.
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, Torsten Hoefler.
In the International Symposium on Code Generation and Optimization (CGO), February 2021.
Deep Learning for Post-Processing Ensemble Weather Forecasts
Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler.
In Philosophical Transactions of The Royal Society A, February 2021.
Workflows are the New Applications: Challenges in Performance, Portability, and Productivity
Tal Ben-Nun, Todd Gamblin, D. S. Hollman, Hari Krishnan, Chris J. Newburn.
In the IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), November 2020.
-
Augment Your Batch: Improving Generalization Through Instance Repetition
Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
-
Groute: Asynchronous multi-GPU programming model with applications to large-scale graph processing
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, Keshav Pingali.
In ACM Transactions on Parallel Computing (TOPC), June 2020.
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations -
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler.
In Proceedings of the 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2019), May 2019.
-
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun and Torsten Hoefler.
In ACM Computing Surveys (CSUR), March 2019.
-
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs
Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler.
In Supercomputing (SC'19), November 2019.
-
A Data-Centric Approach to Extreme-Scale Ab Initio Dissipative Quantum Transport Simulations
Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernandez, Timo Schneider, Torsten Hoefler.
In Supercomputing (SC'19), November 2019.
ACM Gordon Bell Prize. -
Substream-Centric Maximum Matchings on FPGA
Maciej Besta, Marc Fischer, Tal Ben-Nun, Johannes De Fine Licht, Torsten Hoefler.
In Proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2019), February 2019.
Best paper nominee. -
Neural Code Comprehension: A Learnable Representation of Code Semantics
Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler.
In Neural Information Processing Symposium (NeurIPS 2018), December 2018.
-
Accelerating Deep Learning Frameworks with Micro-batches
Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka.
In IEEE Cluster 2018, September 2018.
-
Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling
Michael Sutton, Tal Ben-Nun, Amnon Barak.
In Proceedings of the 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2018), May 2018.
-
Big Data Causing Big (TLB) Problems: Taming Random Memory Accesses on the GPU
Tomas Karnagel, Tal Ben-Nun, Matthias Werner, Dirk Habich, Wolfgang Lehner.
In Proceedings of the 13th International Workshop on Data Management on New Hardware (ACM DaMoN), May 2017.
-
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, Keshav Pingali.
In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2017), February 2017.
Best paper nominee. -
Reciprocal grids: a hierarchical algorithm for computing solution x-ray scattering curves from supramolecular complexes at high resolution
Avi Ginsburg, Tal Ben-Nun, Roi Asor, Asaf Shemesh, Israel Ringel, Uri Raviv.
In ACS Journal of Chemical Information and Modeling (JCIM), August 2016. -
Spline-Based Parallel Nonlinear Optimization of Function Sequences
Tal Ben-Nun, Amnon Barak, Uri Raviv.
In Elsevier Journal of Parallel and Distributed Computing (JPDC), April 2016. -
Memory Access Patterns: The Missing Piece of the Multi-GPU Puzzle
Tal Ben-Nun, Ely Levy, Amnon Barak, Eri Rubin.
In IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), November 2015. -
Solution X-Ray Scattering Form-Factors with Arbitrary Electron Density Profiles and Polydispersity Distributions
Tal Ben-Nun, Roi Asor, Avi Ginsburg, Uri Raviv.
In Israel Journal of Chemistry (IJC), November 2015. -
MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction
Eri Rubin, Ely Levy, Amnon Barak, Tal Ben-Nun.
In ACM Transactions on Architecture and Code Optimization (TACO), January 2015. -
X+: A Comprehensive Computationally Accelerated Structure Analysis Tool for Solution X-ray Scattering from Supramolecular Self-assemblies
Tal Ben-Nun, Avi Ginsburg, Pablo Szekely, Uri Raviv.
In Journal of Applied Crystallography. Volume 43 (6), December 2010. -
A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices
Amnon Barak, Tal Ben-Nun, Ely Levy, Amnon Shiloh.
In the PPAAC workshop, IEEE Cluster 2010, September 2010. -
Solution X-ray Scattering Form Factors of Supramolecular Self-Assembled Structures
Tal Ben-Nun, Pablo Szekely, Avi Ginsburg, Uri Raviv.
In the Langmuir journal. Volume 26 (16), July 2010. -
Design and Implementation of a Generic Resource Sharing Virtual Time Dispatcher
Tal Ben-Nun, Yoav Etsion, Dror G. Feitelson.
In SYSTOR 2010, May 2010. -
A Global Scheduling Framework for Virtualization Environments
Yoav Etsion, Tal Ben-Nun, Dror G. Feitelson.
In 5th Intl. Workshop System Management Techniques, Processes, and Services, May 2009.
Best student paper.
Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler.
In Proceedings of the 25th Symposium on Principles and Practice of Parallel Programming (PPoPP'20), February 2020.
Best paper nominee.
Contact
E-Mail Address: talbnllnl gov