Florian Mayer

Florian Mayer, M. Sc.

Computer Science Department
Programming Systems Group (Informatik 2)

Room: Raum 05.156
Martensstr. 3
91058 Erlangen
Germany

Student contact hour

Prearrange an appointment by email.

Projects

  • OpenMP for reconfigurable heterogenous architectures

    (Third Party Funds Group – Sub project)

    Overall project: OpenMP für rekonfigurierbare heterogene Architekturen
    Term: 01.11.2017 - 31.12.2023
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)
    URL: https://www2.cs.fau.de/research/ORKA/
    High-Performance Computing (HPC) is an important component of Europe's capacity for innovation and it is also seen as a building block of the digitization of the European industry. Reconfigurable technologies such as Field Programmable Gate Array (FPGA) modules are gaining in importance due to their energy efficiency, performance, and flexibility.
    There is also a trend towards heterogeneous systems with accelerators utilizing FPGAs. The great flexibility of FPGAs allows for a large class of HPC applications to be realized with FPGAs. However, FPGA programming has mainly been reserved for specialists as it is very time consuming. For that reason, the use of FPGAs in areas of scientific HPC is still rare today.
    In the HPC environment, there are various programming models for heterogeneous systems offering certain types of accelerators. Common models include OpenCL (http://www.opencl.org), OpenACC (https://www.openacc.org) and OpenMP (https://www.OpenMP.org). These standards, however, are not yet available for the use with FPGAs.

    Goals of the ORKA project are:

    1. Development of an OpenMP 4.0 compiler targeting heterogeneous computing platforms with FPGA accelerators in order to simplify the usage of such systems.
    2. Design and implementation of a source-to-source framework transforming C/C++ code with OpenMP 4.0 directives into executable programs utilizing both the host CPU and an FPGA.
    3. Utilization (and improvement) of existing algorithms mapping program code to FPGA hardware.
    4. Development of new (possibly heuristic) methods to optimize programs for inherently parallel architectures.

    In 2018, the following important contributions were made:

    • Development of a source-to-source compiler prototype for the rewriting of OpenMP C source code (cf. goal 2).
    • Development of an HLS compiler prototype capable of translating C code into hardware. This prototype later served as starting point for the work towards the goals 3 and 4.
    • Development of several experimental FPGA infrastructures for the execution of accelerator cores (necessary for the goals 1 and 2).

    In 2019, the following significant contributions were achieved:

    • Publication of two peer-reviewed papers: "OpenMP on FPGAs - A Survey" and "OpenMP to FPGA Offloading Prototype using OpenCL SDK".
    • Improvement of the source-to-source compiler in order to properly support OpenMP-target-outlining for FPGA targets (incl. smoke tests).
    • Completion of the first working ORKA-HPC prototype supporting a complete OpenMP-to-FPGA flow.
    • Formulation of a genome for the pragma-based genetic optimization of the high-level synthesis step during the ORKA-HPC flow.
    • Extension of the TaPaSCo composer to allow for hardware synchronization primitives inside of TaPaSCo systems.

    In 2020, the following significant contributions were achieved:

    • Improvement of the Genetic Optimization.
    • Engineering of a Docker container for reliable reproduction of results.
    • Integration of software components from project partners.
    • Development of a plugin architecture for Low-Level-Platforms.
    • Implementation and integration of two LLP plugin components.
    • Broadening of the accepted subset of OpenMP.
    • Enhancement of the test suite.

    In 2021, the following significant contributions were achieved:

    • Enhancement of the benchmark suite.
    • Enhancement of the test suite.
    • Successful project completion with live demo for the project sponsor.
    • Publication of the paper "ORKA-HPC - Practical OpenMP for FPGAs".
    • Release of the source code and the reproduction package.
    • Enhancement of the accepted OpenMP subset with new clauses to control the FPGA related transformations.
    • Improvement of the Genetic Optimization.
    • Comparison of the estimated performance data given by the HLS and the real performance.
    • Synthesis of a linear regression model for performance prediction based on that comparison.
    • Implementation of an infrastructure for the translation of OpenMP reduction clauses.
    • Automated translation of the OpenMP pragma `parallel for` into a parallel FPGA system.

    In 2022, the following significant contributions were achieved:

    • Generation and publication of an extensive dataset on HLS area estimates and actual performance.
    • Creation and comparative evaluation of different regression models to predict actual system performance from early (area) estimates.
    • Evaluation of the area estimates generated by the HLS.
    • Publication of the paper “Reducing OpenMP to FPGA Round-trip Times with Predictive Modelling”.
    • Development of a method to detect and remove redundant read operations in FPGA stencil codes based on the polyhedral model.
    • Implementation of the method for ORKA-HPC.
    • Quantitative evaluation of that method to show the strength of the method and to show when to use it.
    • Publication of the paper “Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil Codes”.

    In 2023, the following significant contributions were achieved:

    • Development and implementation of an optimization method for canonical loop shells (e.g. from OpenMP target regions) for FPGA hardware generation using HLS. The core of the method is a loop restructuring based on the polyhedral model that uses loop tiling, pipeline processing, and port widening to avoid unnecessary data transfers from/to the onboard RAM of the FPGA, increase the number of parallel active circuits, maximize data throughput to FPGA board RAM, and hide read/write latencies.
    • Quantitative evaluation of the strengths and application areas of this optimization method using ORKA-HPC.
    • Publication of the method in the conference paper "Employing polyhedral methods to optimize stencils on FPGAs with stencil-specific caches, data reuse, and wide data bursts".
    • Publication of a reproduction package for the optimization method.
    • Presentation of the method at the conference "14th International Workshop on Polyhedral Compilation Techniques" in a half-hour talk.
    • Development of a method for the fully automatic integration of multi-purpose caches into FPGA solutions generated from OpenMP.
    • Evaluation of multi-purpose caches in combination with HLS generated hardware blocks.
    • Publication of the paper "Multipurpose Cacheing to Accelerate OpenMP Target Regions on FPGAs" (Best Paper Award).

Current courses

Optimierungen in Übersetzern

Title Optimierungen in Übersetzern
Short text inf2-ue2
Module frequency nur im Sommersemester
Semester hours per week 2

Voraussetzung zur Teilnahme an der Prüfung ist die erfolgreiche Bearbeitung der Übungsaufgaben.

As a main focus the lecture gives an overview of optimizing techniques applicable to procedural programming languages. In particular, optimization techniques that are of importance to high performance computers and parallel computers are covered. Techniques and representations are introduced that are necessary to compute and to manage information required for optimization are covered as well.

The following list of key words provides an overview of the topics covered in this lecture:

- dependence analysis, dependence graph, array subscript analysis, SSA, control flow Graph, dominators.
- loop transformations: strength reduction, induction variable elimination, loop-invariant code motion, loop unswitching.
- loop reordering: loop interchange, loop skewing, loop reversal, strip mining, loop tiling, loop distribution, loop fusion.
- loop restructuring: loop unrolling, loop coalescing, loop replacement (reduction), loop idiom recognition.
- memory access transformations: array padding, cache miss jamming, scalar expansion, array contraction.
- partial evaluation: constant propagation, constant folding, algebraic simplification, strength reduction.
- redundancies removal: unreachable-code elimination, useless-code elimination, dead-variable elimination, common-subexpression elimination.
- procedure call transformations: leaf procedure optimization, procedure inlining, procedure cloning, function memoization, tail recursion elimination.
- transformations for parallel machines: data decomposition, scalar privatization, array privatization, data partitioning and computation partitioning, guard introduction, message aggregation, message pipelining, prefetch and poststore, syncronization elimination.
- pointer analysis, alias analysis null

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Thu, 16:15 - 17:45 18.04.2024 - 18.07.2024 30.05.2024
09.05.2024
  • Tobias Heineken
  • Florian Mayer
11301.00.031

Übungen zu Optimierungen in Übersetzern

Title Übungen zu Optimierungen in Übersetzern
Short text inf2-ueb-uebersetzer
Module frequency nur im Sommersemester
Semester hours per week 2

Zeit und Ort für die Übungen werden in der ersten Vorlesungsstunde vereinbart.

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Fri, 10:15 - 11:45 19.04.2024 - 19.07.2024 31.05.2024 11302.02.133

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

2. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 10:15 - 11:45 15.04.2024 - 15.07.2024 20.05.2024
  • Tobias Heineken
11302.02.134

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

3. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 14:15 - 15:45 15.04.2024 - 15.07.2024 20.05.2024
  • Florian Mayer
11302.02.133

Publications

2024

2023

2022

2019

Supervised theses

Sorted alphabetically in UnivIS