Research projects
Ongoing research projects
Past research projects
Automatic Detection of Race-Conditions(Own Funds) Project leader: Abstract:Large software projects with hundreds of developers are difficult to review and often contain many bugs. Automatic tests are a well established technique to test sequential and deterministic software. They test the whole project (system test) or each module by itself (unit test). However, recent software contains more and more parallelism. This introduces several new bug patterns, like deadlocks and concurrent memory accesses, that are harder or even impossible to be detected reliably using conventional test methods. Whether the faulty behavior actually shows at runtime depends on the concrete scheduling of the threads which is indeterministic and varies between individual executions depending on the underlaying system. Due to this unpredictable behavior such bugs do not necessarily manifest in an arbitrary test run or may never arise in the testing environment at all. As a result, conventional tests are not well suited for modern, concurrent software. In 2016 existing approaches were studied regarding their usability as a starting basis for our project. The most promising method uses model checking and predefined assertions to construct thread schedules that trigger the faulty behavior. However, the approach is currently infeasible for larger projects because only very small codes could be analyzed in reasonable time. Therefore, we focused on automatically detecting and removing statements that are unrelated to the parallelism respectively to the potentially faulty code parts in order to decrease the execution time of the preliminary static analysis. In 2017 the work on automatically reducing programs to speed up furher analysis was continued. Furthermore, we evaluated whether the concept of mutation testing can be applied to parallel software as well. The results indicate that this extension is indeed possible to rate tests qualitativly. However, to complete the analysis for larger programs in reasonable time, a few heuristics need to be applied during the process. In 2018 the focus moved to a deterministic execution of test cases. A concept to reproduce results during the execution was developed: In addition to the test case, a schedule specifies the dynamic behaviour of the threads. Instrumenting the code at previously marked positions and other relevant byte code instructions allows a separate control thread to enforce the schedule. When modifying the source code, the marked positions in the code need to be updated as well to keep them consistent with the test cases. A merging technique similar to the ones used in version control systems shall be used to automatically update the positions. Up to 2019, this project was a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi.fau.de/ ). In this context, several improvements for the quality of concurrent software were analyzed. The take-away result was that different approaches are applicable and required, but they also often suffer from long analysis times. Publications:
|
Automatic Code Parallelization at Runtime(Own Funds) Project leader: Abstract:
Our automatic parallelization efforts are currently focused on dynamic parallelization. While a program is running, it is analyzed to find loops where parallelization can help performance. Our current idea is to run long-running loops three times. The first two runs analyze the memory accesses of the loop and can both run in parallel. The first run stores in a shared data structure for every memory address in which loop iteration a write access happens. We do not need any synchronization for this data structure, only the guarantee that one value is written to memory, when two concurrent writes happen. In the second pass we check for every memory access, if it has a dependency to one of the stored write accesses. A write access is part of any data-dependency, so we can find all types of data dependencies. If we do not find any, the loop is actually run in parallel. If we find dependencies, the loop is executed sequentially. We can execute the analyses in parallel to a modified sequential execution of the first loop iterations. In 2014 we enhanced the analysis so that a loop can start running while the remainder of the loop is analyzed to see if it can be run in parallel. To allow the sequential loop to execute while the tail of the loop is analyzed we needed to instrument the sequential loop slightly. The result is that the loop runs only slightly slower if the loop cannot be parallelized, but if the loop is found to be parallelizable, speedup is near to linear. Finally, we also created a new language that uses the above library for run-time parallelization. Any loops that the programmer marked as candidates for run-time parallelization are analyzed for constructs that the library cannot yet handle. If the loop is clean, code is generated that uses the library's macros. The project is a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi-anwendungszentrum.de/) |
Design for Diagnosability(Third Party Funds Single) Project leader: Abstract:Many software systems behave obtrusively during the test phase or even in normal operation. The diagnosis and the therapy of such runtime anomalies is often time consuming and complex, up to being impossible. There are several possible consequences for using the software system: long response times, inexplicable behaviors, and crashes. The longer the consequences remain unresolved, the higher is the accumulated economic damage. Before we have successfully completed the project in July 2016, we have made the following contributions:
Although funding expired in 2016, we made further contributions in 2017:
We continued to make further contributions to the research project in 2018:
External Partners:
|
Adaptive Algorithms for RF-based Locating Systems(Third Party Funds Single) Project leader: Abstract:
The goal of this project is the development of adaptive algorithms for radio-based realtime locating systems. In the scope of this project we cover three essential topics: Automatic configuration of event detectors. In previous research projects we built the basics for the analysis of noisy sensor data streams. However, event Evaluation of machine learning techniques for locating applications. In previous research projects we already developed machine learning algorithms for radio-based locating systems (e.g., evolutionary algorithms to estimate antenna positions and orientations). This work package investigates further approaches that use machine learning to enhance the performance of realtime locating systems. Evaluation of vision-based techniques to support radio-based realtime locating. |
Embedded Realtime Language Development Framework(Own Funds) Project leader: Abstract:
ErLaDeF is our test-bed for new programming language and compiler techniques. Our main focus is on building infrastructure for easier (hard + soft) real-time embedded parallel systems programming. Runtime Parallelization of Programs Our automatic parallelization efforts are currently focused on dynamic parallelization. While a program is running, it is analyzed to find loops where parallelization can help performance. Our current idea is to run long-running loops three times. The first two runs analyze the memory accesses of the loop and can both run in parallel. The first run stores in a shared data structure for every memory address in which loop iteration a write access happens. We do not need any synchronization for this data structure, only the guarantee that one value is written to memory, when two concurrent writes happen. In the second pass we check for every memory access, if it has a dependency to one of the stored write accesses. A write access is part of any data-dependency, so we can find all types of data dependencies. If we do not find any, the loop is actually run in parallel. If we find dependencies, the loop is executed sequentially. We can execute the analyses in parallel to a modified sequential execution of the first loop iterations. In 2013 we have explored alternatives to polymorphism and inheritance that may be easier to analyze. We have also examined alternative thread synchronization methods, for example transactional memory, implicit synchronization, remote procedure calls, etc. In 2014 we enhanced the analysis so that a loop can start running while the remainder of the loop is analyzed to see if it can be run in parallel. To allow the sequential loop to execute while the tail of the loop is analyzed we needed to instrument the sequential loop slightly. The result is that the loop runs only slightly slower if the loop cannot be parallelized, but if the loop is found to be parallelizable, speedup is near to linear. Finally, we also created a new language that uses the above library for run-time parallelization. Any loops that the programmer marked as candidates for run-time parallelization are analyzed for constructs that the library cannot yet handle. If the loop is clean, code is generated that uses the library's macros. Design Patterns for Parallel Programming A library of parallel programming patterns allows a programmer to select well known parallelization and inter-core communication strategies from a well-debugged library. We are performing research into what (communication) patterns actually exist and when they can be applied. We have collected over 30 different patterns for parallel communication. In 2013 we investigated mechanisms to automatically determine the fitting implementation for a given software and hardware environment. We also added a set of distributed channels where cores can send data from one local memory to another. The Script based language for embedded systems (Pylon) Pylon is a language that is close to scripting languages (but is statically typed). A large part of the complexity that a programmer would normally take care of when creating an application, is moved to the compiler (i.e., type inference). The programmer does not have to think about types at all. By analyzing the expressions in the program, types are inferred (duck typing). The language is also implicitly parallel, the programmer does not need to have expert knowledge to parallelize an application. The compiler automatically decides what to run in parallel. Finally, the language is kept simple so that learning the language remains easy for novice programmers. For example, we kept the number of keywords small. Interactive Program Analysis To ensure that program design errors are caught early in the development cycle, it is necessary to find bugs while editing. This requires that any program analysis works at interactive speeds. We are following two approaches to this. The first approach centers around algorithmic changes to program analysis problems. Making analysis problems lazy means that only those parts of a program should be examined that are pertinent to the question that is currently asked by the compiler. For example, if the compiler needs to know which functions access a certain object, it should not examine unrelated functions, classes, or packages. Making program analysis incremental means that a small change in the program should only require small work for the (re-)analysis. To achieve that, a program is split recursively into parts. Then, for each of the parts, it In 2013 our key focuses were twofold: Firstly, we developed data structures that can both precisely and efficiently describe the effects of a part of a program. Secondly, we developed both efficient and precise algorithms to create and use these data structures. In 2014 we expanded and modified this analysis framework in order to support big code bases, to analyze them and keep the analysis results for later use. This enables us to precisely analyze programs that use libraries, by first analyzing the library, and then using the library{'}s analysis result to get precise analysis results for our program. Our second approach to bring compiler analysis to interactive speed is to make the analysis itself parallel. In 2013 we continued to develop data-parallel formulations of basic compiler analyses. We have started to implement a generic data-parallel predicate propagation framework. Its data-parallel forms are then portably executable on many different multi-core architectures. Object-oriented languages offer the possibility to dynamically allcoate objects. The memory required for this is allocated at run-tie. However, in contrast to desktop systems, embedded systems typically have very little memory. If the 'new' operator is used often in an embedded system (that are now starting to be programmed with higher-level languages such as Java and C++ that include 'new'), memory can be exhausted at run-time causing an embedded system to crash. In 2014, we created an analysis to find this problem at compile-time and report this to the developer. If it is statically determined that an object can be removed, the developer needs to insert a 'delete' statement. With explicit memory management, we are now able to statically determine a program's worst-case memory requirements. The whole analysis outlined above is integrated into Pylon and a predicate propagation framework previously reported on. Note that the analysis is language independent, however, and can be applied to other languages as well (Java, C++, etc.). However, we can in that case no guarantee that reference counts are correctly computed as we require Pylon{'}s analyzability for this. The ErLaDeF project is a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi-anwendungszentrum.de/) |
Efficient Software Architectures for Distributed Event Processing Systems(Third Party Funds Single) Project leader: Abstract:
Localization Systems (also known as Realtime Location Systems, or RTLS) become more and more popular in industry sectors such as logistics, automation, and many more. These systems provide valuable information about whereabouts of objects at runtime. Therefore, processes can be traced, analyzed, and optimized. Besides the research activities at the core of localization systems (like resilience and interference-free location technologies or methods for highly accurate positioning), algorithms and techniques emerge that identify meaningful information for further processing steps. Our research focuses on automatic configuration methods for RTLSs as well as on the generation of dynamic motion models and techniques for event processing on position streams at runtime. In 2011, we investigated whether events can be predicted after analyzing and learning event streams from the localization system at runtime. As a result, we are able to deduce models that represent the information buried in the event stream to predict future events. We developed several methods and techniques in 2012 that process and detect events with low latency. Events (composite, complex) can be detected by means of a hierarchical aggregation of sub-events that themselves are detected by (several) event detectors processing sub-information in the event stream. This greatly reduces the complexity of the detection components and renders them fully maintainable. They even can use parallel or distributed cluster architectures more efficiently so that important events can be detected within a few milliseconds. In 2013 we further minimized detection latency in distributed event-based systems: first, a new migration technique modifies and optimizes the allocation of software components in a networked environment at runtime to minimize networking overhead and detection latencies. Second, a speculative event processing technique uses conservative buffering techniques to exploit available system resources. We also created and published a representative data set (consisting of realtime position data and event streams) and a corresponding task description. In 2014 we investigated fundamental approaches zu handle uncertainties (both w.r.t. the definition of event detectors and to the events). We implemented a promising prototype of an event-based system that is no longer deterministic but instead evaluates several possible system states in parallel to achieve a detection with a much higher robustness and correctness. The domain expert can parameterize the event detectors by attaching probabilities or probability functions to the generated events. In 2015 we improved, optimized and published our approach. Furthermore we started to investigate approaches to learn optimal parameter sets for the event detectors. Thus, a manual adjustment and tuning of parameters (like thresholds) becomes unnecessary. The project is a contribution of the Programming Systems Group to the IZ ESI http://www.esi.uni-erlangen.de/ |
Computer Science basics as an essential building block of modern STEM field curricula(Third Party Funds Single) Project leader: Abstract:
The increasing digitalization of all areas of science and life render competencies in the foundations of computer science essential for all tech students and more. For the success of their academic studies, often their courses, especially the introductory ones, are problematic hurdles that may lead to a dropout. |
Graphs and Graph Transformations(Own Funds) Start date: 01.10.2004 Abstract:
Graphs are often used as an intuitive aid for the clarification of complex matters. Examples of outside computer science include, e.g., chemistry where molecules are modeled in a graphical way. In computer science, data or control flow charts are often used as well as entity relationship charts or Petri-nets to visualize software or hardware architectures. Graph grammars and graph transformations combine ideas from the fields of graph theory, algebra, logic, and category theory, to formally describe changes in graphs. Category theory is an attractive tool for the description of different structures in a uniform way, e.g., the different models for asynchronous processes: Petri-Nets are based on standard labeled graphs, state charts use hierarchical graphs, parallel logic programming can be interpreted in a graph-theoretical way using so-called jungles, and the actor systems can be visualized as graphs, whose labeling alphabet is a set of term graphs. Lately, we have concentrated our attention on a theoretical aspect. Our work on graph transformation is based on notions borrowed from category theory. The so-called double-pushout approach represents a production by two morphisms starting at a common interface graph. One pushout glues the left-hand side of the production into the context, the other does with the right-hand side. Effectively constructing a derivation step, however, requires finding a pushout complement on the left-hand side. Some people consider this disadvantageous. In 1984, Raoult has proposed to model graph rewriting by a single pushout; Loewe has extensively studied this approach, but the discussion was mainly restricted to injective morphisms. Under this assumption, the approaches are equivalent. Some relevant applications such as term graph rewriting, however, lead to non-injective morphisms. We have examined these cases in detail, and we could show that the equivalence also holds for non-injective cases as long as the handle satisfies some reasonable conditions. |
Cooperative Exploration and Analysis of Software in a Virtual/Augmented Reality Appliance(Third Party Funds Group – Sub project) Overall project: Cooperative Exploration and Analysis of Software in a Virtual/Augmented Reality Appliance Abstract:Understanding software has a large share in the programming efforts of a software systems, up to 30% in development projects and up to 80% in maintenance projects. Therefore, an efficient and effective way for comprehending software is necessary in a modern software engineering workplace. Three-dimensional software visualization already boosts comprehension and efficiency, so utilization of the latest virtual reality techniques seems natural. Within the scope of the Holoware project, we create an environment to cooperatively explore and analyze a software project using virtual/augmented reality techniques as well as artificial intelligence algorithms. The software project in question is being visualized in said virtual reality, such that multiple participants can simultaneously explore and analyze the software. They can cooperate by communicating about their findings. Different participants benefit from different perspectives on the software, which is augmented by domain specific additional information. This provides them with intuitive access to the structure and behaviour of the software. Various use cases are possible, for example the cooperative analysis of a run time anomaly in a team of domain experts. The domain experts can see the same static structure, augmented with domain specific and detailed information. In the VR environment, they can share their findings and cooperate using their different expertise. In addition, the static and dynamic properties of the software system are analyzed. Static properties include source code, static call relationships or metrics such as LoC, cyclomatic complexity, etc. Dynamic properties can be grouped into logs, traces, runtime metrics, or configurations that are read in at runtime. The challenge lies in aggregating, analyzing, and correlating this wealth of information. An anomaly and significance detection is developed that automatically detects both structural and runtime anomalies. In addition, a prediction system is set up to make statements about component health. This makes it possible, for example, to predict which components are at risk of failing in the near future. Previously, the log entries were added to the traces, creating a detailed picture of the dynamic call relationships. These dynamic relationships are mapped to the static call graph because they describe calls that do not result from the static analysis (for example, REST calls across several distributed components). In 2018, the following significant contributions have been made:
In 2019 we achieved the following improvements:
Our paper "Towards Collaborative and Dynamic Software Visualization in VR" has been accepted for publication at the International Conference on Computer Graphics Theory and Applications (VISIGRAPP) 2020. It presents the efficiency of our prototype at increasing the software understanding process. In 2020, our paper "A Layered Software City for Dependency Visualization" was accepted at the International Conference on Computer Graphics Theory and Applications (VISIGRAPP) 2021 and later received the "Best Paper Award". We demonstrated that our Layered Layout for Software Cities simplifies the analysis of software architecture and outperforms the standard layout by far. We successfully concluded the research project with a final prototype and the resulting publications. In 2021, after the end of the official project funding we were asked to submit an extended version of the award paper (" Static And Dynamic Dependency Visualization In A Layered Software City") for review to a journal. Here we present a night view of the city that shows dynamic dependencies as arcs. We thus addressed a central, yet remaining issue: the visualization of dynamic dependencies. In the paper "Trace Visualization within the Software. City Metaphor: A Controlled Experiment on Program Comprehension" at the IEEE Working Conference on Software Visualization (VISSOFT), we displayed dynamic dependencies within the Software City by means of light intensities and were able to show that this representation is more helpful than drawing all dependencies. Also for this paper, we were invited to submit an extended journal article "Trace Visualization within the Software City Metaphor: Controlled Experiments on Program Comprehension" for review. This article demonstrates an extended visualization of dynamic dependencies and color arcs based on HTTP status codes. In 2022, both journal papers were accepted: "Static And Dynamic Dependency Visualization in a Layered Software City" is published in Springer Nature Computer Science Journal and "Trace Visualization within the Software City Metaphor: Controlled Experiments on Program Comprehension" was accepted for the Information and Software Technology Journal. For the finalization of Holoware, all extensions were combined into one single visualization. For this purpose, different views were applied, allowing the user to switch between them: in the day view, the software architecture can be analyzed in the novel Holoware layered layout, and in the night view, dynamic dependencies are displayed. As part of a master thesis, Holoware was also implemented as an AR visualization, so that it can easily be used as a showcase or in everyday work. External Partners:
Publications:
|
Incremental Code Analysis(Own Funds) Project leader: Abstract:
To ensure that errors in a program design are caught early in the development process, it is useful to detect mistakes already during the editing of the code. For that the employed analysis has to be fast enough to enable interactive use. One method to achieve this is the use of incremental analysis, which combines analysis results of parts of the program to analyze the whole program. As an advantage, it is then possible to re-use large parts of the analysis results when a small change to the program occurs, namely for the unaffected parts of the program and for libraries. Thus the work required for analysis can be drastically reduced, which makes the analysis useful for interactive use. In 2015 we focused on improving the algorithms and data structures used for the analysis. We were able to significantly improve the runtime and memory requirements for analyzing a given program. Additionally, the analyzed program may now contain more, and more expressive language features. In 2016 we focused on improving the algorithms and data structures used for the analysis. We improved both the scalability of the analysis towards large code bases with more than 1 mio. statements, and the incremental analysis, where we re-used the analysis results for unmodified program parts, drastically speeding up the analysis for typical software projects (i.e. with a large code base and small, incremental changes) In 2017 we continued improving the algorithms and data structures used for the analysis. In addition to further development of the analysis' scalability towards large code bases, and of incremental analysis (where we re-used the analysis results for unmodified program parts), we focused on an easy to grasp documentation of the analysis, in order to understand it, and to lay theoretical basics to verify the analysis' correctness. |
Inter-Thread Testing(Own Funds) Project leader: Abstract:
In order to achieve higher computing performance, microprocessor manufacturers do not try to achieve faster clock speed anymore - on the contrary: the absolute number of cycles has even decreased, while the number of independent processing units (cores) per processor is continually increased. Due to this evolution, developers must learn to think outside the box: The only way to make their applications faster (in terms of efficiency) is to modularize their programs such that independent sections of code execute concurrently. Unfortunately, present-day systems have reached a level of functional complexity, such that even software for sequential execution is significantly error-prone - the parallelization for multiple cores adds yet another dimension to the non-functional complexity. Although research in the field of software engineering emerged several different quality assurance measures, there are still very few effective methods for testing concurrent applications, as the broad emergence of multi-core systems is relatively young. This project aims to fill that gap by providing an automated test system. First of all, a testing criteria hierarchy is needed, which provides different coverage metrics tightly tailored to the concept of concurrency. Whilst for example branch coverage for sequential programs requires the execution of each program branch during the test (i.e. making the condition of an if-statements both true and false - even if there is no explicit else branch), a thorough test completion criterion for concurrent applications must demand for the systematic execution of all relevant thread interleavings (i.e. all possibly occurring orderings of statements, where two threads may modify a shared memory area). A testing criterion defines the properties of the 'final' test set only, but does not provide any support for identifying individual test cases. In contrast to testing sequentially executed code, test scenarios for parallel applications must also comprise control information for deterministically steering the execution of the TUT (Threads Under Test). In 2012, a framework for Java has been developed, which automatically generates such control structures for TUT. The tester must provide the bytecode of his application only; further details such as source code or restrictions of the test scenario selection are optional. The approach uses aspect-oriented programming techniques to enclose memory access statements (reads or writes of variables, responsible for typical race conditions) with automatically generated advices. After weaving the aspects into the SUT (System Under Test), variable accesses are intercepted at runtime, the execution of the corresponding thread is halted until the desired test scenario is reached, and the conflicting threads are reactivated in the order imposed by the given test scenario. In order to demonstrate the functionality, some naive sequence control strategies were implemented, e.g. alternately granting access to shared variables from different threads. In 2013, the prototypical implementation of the InThreaT framework has been reengineered as an Eclipse plugin. This way, our approach can also be applied in a multi-project environment and the required functionality integrates seamlessly into the development environment (IDE) familiar to programmers and testers. In addition, the configuration effort required from the tester could be reduced, as e.g. selecting and persisting the interesting points of interleaving also became intuitively usable parts of the IDE. In order to automatically explore all relevant interleavings, we need an infrastructure to enrich functional test cases with control information, required to systematically (re)execute individual test cases. In 2013, such an approach for JUnit has been evaluated and implemented prototypically, which allows to mark individual test cases or whole test classes with adequate annotations. |
Integrated Tool Chain for Meta-model-based Process Modeling and Execution(Third Party Funds Single) Project leader: Abstract:As demands on the development of complex software systems are continuously increasing, compliance with well-defined software development processes becomes even more important. Especially large and globally distributed software development projects tend to require long-running and dynamically changeable processes spanning multiple organizations. In order to describe and support such processes, there is a strong need for suitable process modeling languages and for powerful support by tools. The results of a preceding cooperation project show that today's tools markets lack integrated tool chains which actually support the fine-grained and precise modeling of software development processes as well as their computer-aided execution, controlling and monitoring. A cooperation project has bridged this gap. This cooperation project was carried out together with develop group as an industrial partner and was funded by BMWi. It started in October 2008 and has been scheduled for three researchers. The project was finished in September 2011. The objective of this cooperation project was to prototype an integrated tool chain by using a rigorous, meta-model based approach that supports modeling, enactment, and execution of industrial software development processes. Bearing the applicability of such a tool in mind, this approach was mainly intended to provide a wide adaptability of process models to different industrial development scenarios, to define a user-friendly concept of process description and to establish an extensive computer-aided process execution support, contributing to the efficiency of development activities. These benefits were achieved by a high grade of formalism, by an integrated, generic concept of process modeling and process enactment and by using commonly accepted industrial standards (UML, SPEM). The integrated tool chain developed in this project is based on an extension of the SPEM standard (eSPEM - enactable SPEM). eSPEM adds a behaviour modeling concept by reusing UML activity and state machine diagrams. In addition, eSPEM provides behaviour modeling concepts that are specific to software development processes, for example, dynamic task creation and scheduling. In 2012, an overview of the tool chain and eSPEM has been presented at the "First Workshop on Academics Modeling with Eclipse" which was held in conjunction with the "8th European Conference on Modeling Foundations and Applications". In addition, practical experiences from modeling SDPs in industrial projects have shown a rising importance of standards and reference models which are subsequently summarized under the term quality standard. These quality standards are used to specify requirements for target-oriented and effective execution of software development projects. These requirements are thereby defined to address different goals related to e.g. quality and efficiency (Automotive SPICE, CMMI) or safety (ISO 26262 Road Vehicles - Functional Safety) aspects of SWDPMs (Software Development Process Models). In other words, these requirements - often described in terms of best practices - are imposed on the software process definition that is typically described by SWDPMs. Tracing these requirements to the process definition is a precondition for supporting efficient assessment activities and process improvement projects. An additional goal of this research project lies therefore in the integration of these quality standards with SWDPMs with a special focus on environments that requires conformance to more than one quality standard (e.g. CMMI, Automotive SPICE and ISO 26262). |
Cluster and Grid computing made easy(Own Funds) Project leader: Abstract:
Jackal is a project to create a distributed-shared-memory system for Java. This means that you can write a multi-threaded program (that you could run using normal JVMs on single machines as well) and deploy it on a cluster connected by a network. Jackal also sports a nice check-pointer so it can periodically write the program state to disk for fault tolerance. |
OpenMP/Java(Third Party Funds Single) Project leader: Abstract:JaMP is an implementation of the well-known OpenMP standard adapted for Java. JaMP allows one to program, for example, a parallel for loop or a barrier without resorting to low-level thread programming. For example: |
JavaParty(Own Funds) Project leader: Abstract:
JavaParty [http://svn.ipd.uni-karlsruhe.de/trac/javaparty/wiki/JavaParty] allows easy porting of multi-threaded Java programs to distributed environments such as e.g. clusters. Regular Java already supports parallel applications with threads and synchronization mechanisms. While multi-threaded Java programs are limited to a single address space, JavaParty extends the capabilities of Java to distributed computing environments. In 2007/08, a complete new version of the JavaParty compiler was implemented. This version supports the current Java Standard 1.5/1.6. The implementation is based on the open and freely available Eclipse compiler framework. Thus, future developments of the Java language and corresponding extensions for the Eclipse compiler will automatically become available for JavaParty as well. In 2009, the JavaParty compiler was extended by a semantics for inner classes. We have reached the following goals in 2010: |
Compiler-supported parallelization for multi-core architectures(Own Funds) Project leader: Abstract:
Several issues significantly retard the development of quicker and more efficient computer architectures. Traditional technologies can no longer contribute to offer more hardware speed. Basic problems are the divergent ratio of the latencies of memory access and CPU speeds as well as the heat and waste of energy caused by increasing clock rates. Homogeneous and heterogeneous multi-core and many-core architectures were presented as a possible answer and offer enormous performance to the programmer. The multi-level cache hierarchy and decreased clock rates help avoid most of the above problems. Potentially, performance can increase even further by specialization of some hardware components. Current target architectures are GPUs with hundreds of arithmetic units and the Intel XeonPhi processor that provides 60 and more cores including hyper threading on a single board. While data parallel problems can be relatively simple accelerated by using the new hardware architectures, the implementation of task parallel problems is our main research focus. The difficulty is often the irregularity of the resulting task tree and thus the different task run times. From the point of view of a programming systems research group, there are - among others - the following open questions: Which core executes which work packet in which order? When do you donate a work packet from one compute node to another? Which data belongs to a work packet, are multiple cores/compute nodes allowed to access the data simultaneously? How do we have to merge data from multiple compute nodes? How can a compiler together with a runtime system create tasks and distribute work packets? In 2011, we have implemented and extended the Cilk programming model for the heterogeneous CellBE architecture (one PowerPC core (PPU) with eight SPU "coprocessors"). The CellBE architecture offers an enormous computing potential on a single chip. To move a work packet in the heterogeneous architecture, we have extended the Cilk programming model by an extra keyword. A source to source transformation then creates code for both, the PPU and SPU cores. Furthermore, we have moved the data along with the work packets in the SPU local stores and used a garbage collection technique to free memory from remote SPUs later. In 2012, we focused on graphic cards (GPUs) as a second target architecture. GPUs offer a lot more performance than ordinary CPUs, however achieving peak performance may be difficult. For data parallel problems, the performance can be achieved using Cuda (NVidia) or OpenCL (AMD) relatively easy. However, it is much more difficult to port task parallel problems with reasonable performance to the GPU, which is one of the goals on our roadmap. Thus, we design, implement and compare various load balancing algorithms. In 2012 we designed a first approach with hierarchical queues under the principle of work donation. In 2013, while further developing the load balancing algorithms for the GPU, we also targeted our work towards the Intel XeonPhi processor. With its many-core architecture and large register sets (and thus the ability to issue vector instructions on multiple data), the XeonPhi processor is a new challenge for load balancing algorithms. In practice, we extended and adopted Cilk for the XeonPhi such that we can automatically merge functions during the source-to-source transformation. This increases the Intel compiler{'}s chances to automatically parallelize. We implemented several analyses that not only increase the number of candidate functions for merging but also avoid (or at least handle) divergence in those merged functions. In 2014, we have extended our existing implementation for XeonPhi processors in a way that we can distribute the work over multiple XeonPhi processors. In contrast to the technique of work stealing that is used to distribute work over the many cores of a single XeonPhi, we use work donation to distribute the work to other XeonPhi processors. With a new source code annotation it is possible to mark the necessary data ranges for a work packet. These data ranges are then distributed along with the work packet and merged at synchronization points, which was the main challenge of the implementation. Furthermore, we have started to extend the clang compiler of the LLVM framework with support for Cilk in order to automatically generate CUDA code for execution on GPUs. Along with the generated CUDA code, we have designed a lightweight but general runtime system that manages execution and execution order of the work packets. We plan to implement analyses to avoid execution divergence as much as possible. In 2015, we evaluated and compared multiple load balancing algorithms to execute Cilk programs on the GPU. Therefore, we implemented queuing algorithms for parallel access and improved the automated generation of the necessary CUDA code. The correct placement of Cilk keywords for synchronization is still a challenge for the programmer. Thus, we generate from plain, recursive C code multiple, "plausible" code variants including synchronisation statements. These code variants will then be executed speculatively and the result from the fastest, correct variant will be used for further computations. Furthermore, the size of the base case of the recursion is crucial for an optimal performance improvement. Consequently, we started to optimize the size of the base case using analysis during compile and run time and will replace recursive calls with function inlining and vectorisation. |
Wireless Localization(Third Party Funds Single) Project leader: Abstract:Localization Systems (also known as Realtime Location Systems, or RTLS) become more and more popular in industry sectors such as logistics, automation and many more. These systems provide valueable information about whereabouts from objects at runtime. Therefore, processes can be traced, analyzed and optimized. Besides the research activities at the core of localization systems (like resilience and interference-free location technologies or methods for highly accurate positioning), there emerge algorithms and techniques to identify meaningful information for further processing steps. In this context, the aim of the wireless localization project is the research on automatic configuration methods for RTLSs as well as the generation of dynamic moving models and techniques for event processing on position streams at runtime. In 2009, we continued the development of our algorithms to estimate the receiving antenna's position (pose) of location systems. The algorithms estimate measuring points which allow a fast and accurate measurement pose. We applied an robot to execute the measurement automatically. The developed algorithm considers obstacles and the receiving characteristics of the location system and can sort out errors contained in the measurement data (e.g. multipath measurements). In 2010, models have been developed that can be used as dynamic moving models. Learning methods are applied to adapt the models at run-time. A formal language called TBL (Trajectory Behavior Language) was developed for describing trajectories. Additional algorithms can shrink the size of that description and hence compress the data size required to store trajectories. We are evaluating methods for learning the moving models online. These are evaluated in a study with respect to motion prediction. Moreover, it is being investigated whether events can be predicted by analyzing and learning event streams from the localisation system at runtime. External Partners:
|
Model Driven Component Composition(Third Party Funds Single) Project leader: Abstract:This project is part of the INI.FAU collaboration between AUDI AG and the University of Erlangen-Nuremberg. It examines model-driven ways to integrate vehicle functions on electronic control units (ECUs). Moreover, the project develops supporting methods and tools for this task. The insights gained in the course of this project will be practically validated by integrating a damper control system into an AUTOSAR ECU. In the automotive industry it is common practice to develop in-car-software on a high level of abstraction and in a model-based way. To eliminate uncertainties concerning resource consumption and runtime it is necessary to test the developed software on the target hardware as early as possible. But due to cost and safety requirements the integration of the software into an ECU is very time-consuming and demands special expertise going beyond that of the function developer. AUTOSAR (AUTomotive Open Systen ARchitecture) is on the way to become a standard for the basic software on ECUs. But due to the novelty of this standard there are neither processes nor tools to support the integration of the developed in-car-software into an ECU. In 2008, we have examined the modeling expressiveness of AUTOSAR with respect to both its applicability and possible conflicts with existing standards and technologies that are currently in use at Audi. Furthermore, the automatic generation of an AUTOSAR software architecture from a single damper control component has been implemented. Since 2009, a model-driven approach that supports the integration of software into an ECU is being implemented and integrated into the tool set used at Audi. In particular we are looking at the automatic configuration of the bus communication by means of a bus database and the automatic task scheduling among the application processes. The model-driven development, which in this case is based on the Eclipse Modeling Framework, will enable easier tayloring of the emerging prototype to changing requirements. In 2010, we exploited the information that is available in an AUTOSAR project to automatically configure local and remote communication between software components. We have also developed a genetic algorithm that uses dependency information to automatically generate task schedulings that minimize communcation latencies between cooperating software components. The existing prototype has been extended with the above-mentioned methods. |
Parallel code analysis on a GPU(Own Funds) Project leader: Abstract:
In compiler construction there are analyses that propagate information along the edges of a graph and modify it, until a fix point is reached and the information no longer changes. In this project we built the ParCAn framework to accelerate such analyses by exploiting the massive parallelism of graphic cards. In 2016, our research focus was on synchronization mechanisms for GPUs. Known synchronization methods for CPUs (e. g. a spin lock) cannot be used without further adjustment on GPUs since their special architectural properties easily lead to dead- and livelocks. Synchronization is required (even for predominantly dataparallel graph implementations) if data dependences occur dynamically. We have therefore developed a novel synchronization mechanism which solves two non-trivial problems related to GPUs: First, we prevent dead- and livelocks. Second, we retain as much parallelism as possible by allowing dataparallel threads to work on disjoint areas of a data structure concurrently. For example, think of threads that modify disjoint locations of a graph without affecting its structural integrity. In our approach, a programmer can provide rules that describe the conditions under which a parallel access is allowed. At runtime, we check these rules and determine how many threads can run in parallel. We currently extend the above synchronisation mechanism with a scheduler that redistributes conflicting data access so that the SIMD execution on a GPU causes less serialization than without the re-ordering. Hence, the degree of parallelism grows. The underlying idea exploits that GPUs organize threads in hierarchical units. If the above synchronization mechanism detects a conflicting access in one of these units, it checks on the next smaller unit whether the conflict can also be found there. If this is not the case, the fewer threads in that unit can run in parallel. This is much better than serializing all threads in the enclosing unit. In this situation it is the scheduler’s task to re-distribute the detected collisions across the units so that as many threads as possible can run in parallel. As the scheduling is performed at run-time it needs to be efficient, must itself run in parallel, and potentially make use of the dynamic thread creation capabilities of modern GPUs. Graphs are fundamental data structures to represent relations between data (e.g., social networks, web link analysis). Graphs can have millions/billions of vertices/edges. GPUs can process graphs with 1000th of threads in parallel very efficiently. Graph Analyses use the Bulk Synchronous Parallel Model (BSP) which divides the analysis into three strictly separated phases: computation, communication and synchronization. The two latter ones require communications with the host system (CPU) that slow down execution. Our GPU-based compiler works after the BSP model too. Internally the code is represented as (control flow-) graph. This graph is transferred to the GPU and gets analyzed. Every code modification triggers this cycle. The Graph has thus to be generated and transferred to the GPU very fast. Publications in the field of graph-analysis focus on optimizing the computation time. The end-to-end execution time (including communication and synchronization) is ignored but has a strong impact on the run-time. Our compiler considers every phase of the BSP. In 2017 we published a paper that significantly reduces the time for synchronization. In 2018 we completed our comparative study on the efficiency of graph data structures on GPUs. To show the effectiveness of our framework we integrated it into the LLVM compiler framework. We picked four LLVM analyses and parallelized them with ParCAn. Ample measurements show that our framework can accelerate LLVM’s compilation process by up to 40%. A publication was accepted at the 28th International Conference on Compiler Construction and will receive the Best-Paper-Award. In 2019, ParCAn was adjusted to the new execution model of NVIDIA’s latest GPU architectures. With the introduction of the Volta architecture, threads can now achieve progress independently of the others. Since Volta every thread has its own program counter and call stack. Previously, a group of threads (called a warp) shared both a common program counter as well as a call stack. The threads either executed the same instruction or were idle (lock-step execution). Applications that are not adjusted to this execution model will compute wrong results. As threads now execute independently of each other, race conditions can occur within warps. Older lock-step fashioned execution models inserted synchronization points to prevent this implicitly. We inspected ParCAn’s source code for code fragments susceptible to causing race conditions on new architectures. These fragments were adjusted to now execute properly on the latest NVIDIA architectures. The use of the GPU as the target architecture raised other research-related questions that were also published. Some analyses store their information in a global data structure that can be modified by all threads simultaneously. Especially the high number of concurrent threads on a GPU demands for efficient synchronization. Thus, as part of the research project we implemented an efficient framework for establishing mutual exclusion, see the LNCS-paper in the references. Previous approaches inevitably resulted in deadlocks when the GPU is fully utilized. Moreover, by using a variant of the inspection-execution paradigm we further improved the efficiency of the framework. Another research topic considered the efficiency of graph structures on GPUs. At its core, ParCAn implements a graph traversal algorithm. The program to be translated is converted into a graph, the control flow graph (CFG), on which the analyses are executed. Due to the large number of parallel accesses, the CFG represents a critical data structure for the performance of ParCAn. For this reason, we conducted an extensive study comparing the performance of graph data structures. We used the results to determine the best possible data structure to represent the CFG. We derived general criteria that allow to make assumptions about the performance of a data structure under certain conditions. Even outside of the context of ParCAN, developers can use these criteria, represented as a decision tree, to choose the most appropriate data structure for their static graph algorithms. The results of the study were presented at the GPGPU workshop, see references. Publications:
|
ParSeMiS - the Parallel and Sequential Graph Mining Suite(Own Funds) Project leader: Abstract:
The ParSeMiS project (Parallel and Sequential Graph Mining Suite) searches for frequent, interesting substructures in graph databases. This task is becoming increasingly popular because science and commerce need to detect, store, and process complex relations in huge graph structures. Aufbauend auf den Ergebnissen des Projekts ParMol2 wurden 2006/2007 folgende Ziele erreicht:
In 2008, the following goals have been achieved:
In 2009, the following goals have been attacked:
In 2010, the distributed stack implementations have also been tested on other algorithms and data structures. |
|
PARSEMIS auf Github |
Parallelization techniques for embedded systems in automation(Own Funds) Project leader: Abstract:
This project was launched in 2009 to address the refactorization and parallelization of applications used in the field of industry automation. These programs are executed on specially designed embedded systems. This hardware forms an industry standard and is used worldwide. As multicore-architectures are increasingly used in embedded systems, existing sequential software must be parallelized for these new architectures in order to improve performance. As these programs are typically used in the industrial domain for the control of processes and factory automation they have a long life cycle. Because of this, the programs often are not being maintained by their original developers any more. Besides that, a lot of effort was spent to guarantee that the programs work reliably. For these reasons the software is only extended in a very reluctant way. Therefore, a migration of these legacy applications to new hardware and a parallelization cannot be done manually, as it is too error prone. Thus, we need tools that perform these tasks automatically or aid the developer with the migration and parallelization. Research on parallelization techniques We developed a special compiler for the parallelization of existing automation programs. First, we examined automation applications with respect to automatic parallelizability. We found that it is hard to perform an efficient automatic parallelization with existing techniques. Therefore this part of the project focuses on two steps to handle this situation. As first step, we developed a data dependence analysis that identifies potential critical sections in a parallel program, presents them to the programmer and adds their protection to the code. We ware able to show that our approach to identify critical sections finds atomic blocks that closely match the atomic blocks that an expert would add to the code. Besides that, we showed in 2014 that the impacts on execution times are negligible if our technique finds atomic blocks that are larger than necessary or that are not necessary at all. As second step we have refined and enhanced existing techniques (software transactional memory (STM) and lock inference) to implement atomic blocks. In our approach, an atomic block uses STM as long as lock inference would lead to coarse-grained synchronization. The atomic block switches from STM to lock inference as soon as a fine-grained synchronization is possible. With this technique, an atomic block always uses fine-grained synchronization while the runtime overhead of STM is minimized at the same time. We showed that (compared to a pure STM or lock inference implementation) our technique speeds up execution times by a factor between 1.1 and 6.3. Although fine-grained synchronization in general leads to better performance than a coarse-grained solution, there are cases where a coarse-grained implementation shows equal performance. We therefore presented a runtime mechanism for an STM that also works together with our combined technique. This runtime mechanism starts with a small number of locks, i.e., a coarse-grained locking, where accesses to different shared variables are protected by the same lock. If this coarse-grained locking leads to too many non-conflicting accesses waiting for the same lock, our approach gradually increases the number of locks. This makes the locking more fine-grained so that non-conflicting accesses can be executed concurrently. Our runtime mechanism that dynamically tunes the locking-granularity makes the programs run up to 3.0 times faster than a fixed coarsegrained synchronization. We completed this project part in 2014. Research on migration techniques Our research on the migration of legacy applications originally consisted of having a tool that automatically replaces suboptimal code constructs with better code. The code sequences that had to be replaced as well as the replacement codes were specified by developers by means of a newly developed pattern description language. However, we found this approach to be too difficult for novice developers. This led us to the development of a new tool that automatically learns and generalizes patterns from source code archives, recognizes them in other projects, and presents recommendations to developers. The foundation of our tool lies in the comparison of two versions of the same program. It extracts the changes that were made between two source codes, derives generalized patterns of suboptimal and better code from these changes, and saves the patterns in a database. Our tool then uses these patterns to suggest similar changes for the source code of different programs. In 2014 we developed a new symbolic code execution engine to minimize the number of wrong recommendations. Depending on the number and the generality of the patterns in the database, it is possible that without the new engine our tool generates some unfitting recommendations. To discard the unfitting ones, we compare the summary of the semantics/behavior of the recommendation with summary of the semantics/behavior of the database pattern. If both differ too severely, our tool drops the recommendation from the results. The distinctive features of our approach are its applicability to isolated code fragments and its automatic configuration that does not require any human interaction. The latest results of our tool SIFE are found online (last update: 2014-05-09). Parts of the project are funded by the "ESI-Anwendungszentrum" [http://www.esi-anwendungszentrum.de/] |
Recurrent Neuronal Networks (RNNs) for Real-Time Estimation of Nonlinear Motion Models(Third Party Funds Single) Project leader: Abstract:With the growing availability of information about an environment (e.g., the geometry of a gymnasium) and about the objects therein (e.g., athletes in the gymnasium), there is an increasing interest in bringing that information together profitably (so-called information fusion) and in processing that information. For example, one would like to reconstruct physically correct animations (e.g., in virtual reality, VR) of complex and highly dynamic movements (e.g., in sports situations) in real-time. Likewise, e.g., manufacturing plants of the industry, which suffer from unfavorable environmental conditions (e.g., magnetic field interference or missing GPS signal), benefit from, e.g., high-precision goods location. Typically, to describe movements, one uses either poses that describe a "snapshot" of a state of motion (e.g., idle state, stoppage), or a motion model that describes movement over time (e.g., walking or running). In addition, human movements may be identified, detected, and sensed by different sensors (e.g., on the body) and mapped in the form of poses and motion models. Different types of modern sensors (e.g., camera, radio, and inertial sensors) provide information of varying quality.In principle, with the help of expensive and highly precise measuring instruments, the extraction of the poses and resp. of the motion model, for example, from positions on small tracking areas is possible without errors. Positions, e.g., of human extremities, can describe or be described by poses and motion models. Camera-based sensors deliver the required high-frequency and high-precision reference measurements on small areas. However, as the size of the tracking surface increases, the usability of camera-based systems decreases (due to inaccuracies or occlusion issues). Likewise, on large areas radio and inertial sensors only provide noisy and inaccurate measurements. Although a combination of radio and inertial sensors based on Bayesian filters achieves greater accuracy, it is still inadequate to precisely sense human motion on large areas, e.g., in sports, as human movement changes abruptly and rapidly. Thus, the resulting motion models are inaccurate.Furthermore, every human movement is highly nonlinear (or unpredictable). We cannot map this nonlinearity correctly with today's motion models. Bayes filters describe these models but these (statistical) methods break down a nonlinear problem into linear subproblems, which in turn cannot physically represent the motion. In addition, current methods produce high latency when they require accuracy.Due to these three problems (inaccurate position data on large areas, nonlinearity, and latency), today's methods are unusable, e.g., for sports applications that require short response times. This project aims to counteract these nonlinearities by using machine learning methods. The project includes research on recurrent neural networks (RNN) to create nonlinear motion models. As modern Bayesian filtering methods (e.g., Kalman and Particle filters) and other statistical methods can only describe the linear portions of nonlinear human movements (e.g., the relative position of the head w.r.t. trunk while walking or running) they are thus physically not completely correct.Therefore, the main goal is to evaluate how machine learning methods can describe complex and nonlinear movements. We therefore examined whether RNNs describe the movements of an object physically correctly and support or replace previous methods. As part of a large-scale parameter study, we simulated physically correct movements and optimized RNN procedures on these simulations. We successfully showed that, with the help of suitable training methods, RNN models can either learn physical relationships or shapes of movement. III. Finally, a demonstration of feasibility shall be tested. The project was successfully completed in 2021. In 2021, as part of a successfully completed dissertation, essential findings from the course of the project were linked and conclusions were drawn, and numerous research questions were addressed and answered. In 2024, we further examine GPT-like models for their uncertainty, explainability, and adaptability. In addition, we analyze the feasibility of these generative methods in relation to various fields of application, e.g., forecasting and anomaly detection, anomaly characterization, anomaly localization, and anomaly mitigation. External Partners:
Publications:
|
Software Project Control Center(Third Party Funds Single) Project leader: Abstract:
Prototypical implementation of a new tool for quality assurance during software development Modern software systems are growing increasingly complex with respect to functional, technical and organizational aspects. Thus, both the number of requirements per system and the degree of their interconnectivity constantly increase. Furthermore the technical parameters, e.g., for distribution and reliability are getting more complex and software is developed by teams that are not only spread around the globe but also suffer from increasing time pressure. Due to this, the functional, technical, and organizational control of software development projects is getting more difficult. The "Software Project Control Center" is a tool that helps the project leader, the software architect, the requirements engineer, or the head of development. Its purpose is to make all aspects of the development process transparent and thus to allow for better project control. To achieve transparence, the tool distills and gathers properties from all artifacts and correlations between them. It presents/visualizes this information in a way suitable for the individual needs of the users. The Software Project Control Center unifies the access to relations between artifacts (traceability) and to their properties (metrics) within software development projects. Thus, their efficiency can be significantly increased. The artifacts, their relations, and related metrics are gathered and integrated in a central data store. This data can be analyzed and visualized, metrics can be computed, and rules can be checked. For the Software Project Control Center project we cooperate with the QAware GmbH, Munich. The AIF ZIM program of the German Federal Ministry of Economics and Technology funded the first 30 months of the project. The Software Project Control Center is divided into two subsystems. The integration pipeline gathers traceability data and metrics from a variety of software engineering tools. The analysis core allows to analyze the integrated data in a holistic way. Each subsystem is developed in a separate subproject. The project partner QAware GmbH implemented the integration pipeline. The first step was to define TraceML, a modeling language for traceability information in conjunction with metrics. The language contains a meta-model and a model library. TraceML allows to define customized traceability models in an efficient way. The integration pipeline is realized using TraceML as lingua franca in all processing steps: From the extraction of traceability information to its transformation and integrated representation. We used the Eclipse Modeling Framework to define the TraceML models on each meta-model level. Furthermore, we used the Modeling Workflow Engine for model transformations and Eclipse CDO as our model repository. A set of wide-spread tools for software engineering are connected to the integration pipeline including Subversion, Eclipse, Jira, Enterprise Architect and Maven. The main contribution of our group to this project is the analysis core, i.e., the design and implementation of a domain-specific language for graph-based traceability analysis. Our Traceability Query Language (TracQL) significantly reduces the effort that is necessary to implement traceability analyses. This is crucial for both industry and the research community as lack of expressiveness and inefficient runtimes of other known approaches used to hinder the implementation of traceability analysis. TracQL eases not only the extraction, but also the analysis of traceability data using graph traversals that are denoted in a concise functional programming style. The language itself is built on top of Scala, a multi-paradigm programming language, and was successfully applied to several real-world industrial projects. In 2014, we improved the modularity of the language to make it both more adaptable and extendable in terms of structure and operations. This not only increases its expressiveness but also improves the reusability of existing traceability analyses. In 2015, we evaluated and documented our approach in order to emphasize its core attributes and to show its effectiveness. The three core attributes are: |
Tapir(Own Funds) Project leader: Abstract:
Tapir is a new programming language to ease systems programming. Systems programming includes networking protocols, operating systems, middlewares, DSM systems, etc. Such systems are critical for the functioning of a system as they supply services that are required by user applications. For example, an operating system supplies an operating environment and abstracts from concrete hardware in doing so. A DSM system simulates a single shared memory machine by abstracting from the single machines inside a cluster so that a (user-level application on a) cluster can be programmed without having to program explicit message passing. Compared to application programming, systems programming has a different set of requirements. The programming 'style' is also very different from the styles used in programming user-level applications. Finally, the performance requirements are usually very strict in systems programs as the complete system's performance greatly relies on the underlying layers of systems programs. Bugs in systems code have also great repercussions on a complete system's reliability. Combined, we can directly conclude the following when using high-level languages for systems programming: The basics of the Tapir language have been created. While Tapir has some similarities to Java, C#, and C++, all unneeded and unwanted functionality of the above have been removed. For example, Tapir has no automatic memory management, no exception handling, and no type-casts. Class and object concepts have been kept, but inheritance has been removed. The resulting Tapir programs can be verified by means of model-checking, even while the programming is still being developed. Verification is made easy as code pieces that are implementation details can be marked as such so that they can be safely ignored by the verifier. Tapir code can be executed in parallel for example also on a graphics card without the possibility of the common programming errors associated with parallelism can occur. Even while the language is still being developed, a prototype DSM protocol has already been implemented in the Tapir language. We have evaluated RDMA-based DSM protocols so that they can be added to the Tapir language. Tapir's semantic checks are implemented by means of model-checking. Model-checking, however, is a very memory intensive analysis. This made us write our own Java Virtual machine, called LVM, which is especially suited for managing large numbers of objects. LVM outperforms standard Java VMs as soon as swapping becomes necessary. 2006/2007 wurde an den grundlegenden Spracheigenschaften von Tapir gearbeitet. Obwohl Tapir an existierende Hochsprachen wie C++ und Java angelehnt ist, wurden alle unnötigen Eigenschaften und Funktionen entfernt. Beispielsweise fehlen Tapir Speicherbereinigung, Ausnahmebehandlung und Typwandlungen; Klassen und Objekte können zwar definiert werden, jedoch ist keine Vererbungsbeziehung zwischen Klassen erlaubt. Das mit Tapir spezifizierte Systemprogramm kann mit Model Checking-Techniken bereits während der Entwicklung auf Fehler überprüft werden. Ein prototypischer Übersetzer und ein Verifikationswerkzeug wurden 2006/2007 implementiert. In 2008, LVM was optimized both for sequential execution and for distributed execution on a cluster of workstations. This allows for faster verification of Tapir programs on clusters and for faster running of scientific Java programs. In 2009, the Tapir language was itself improved to allow both easier automatic program verification and to allow more efficient code to be generated. For example, the language has become easier to verify because code pieces that are implementation details can be marked as such and be safely ignored by the verifier. The efficiency of the language has been improved such that selected parts of programs can now be executed in parallel for example also on a graphics card without the possibility of the common programming errors associated with parallelism can occur. |
Techniques and tools for iterative development and optimization of software for embedded multicore systems(Third Party Funds Single) Project leader: Abstract:Multicore processors are of rising importance in embedded systems as these processors offer high performance while maintaining low power consumption. Developing parallel software for these platforms poses new challenges for many industrial sectors because established tools and software libraries are not multicore enabled. The efficient development, optimization and testing of multicore-software is still open research, especially for reliable real-time embedded systems. In the multi-partner project "WEMUCS" [http://www.multicore-tools.de/] new methods and tools for efficient iterative development, optimization, and testing of multicore software have been created over the past two years. Innovative tools and technologies for modeling, simulation, visualization, tracing, and testing have been developed and integrated into a single tool chain. Using case studies from different industries (automotive, telecommunications, industry automation) these tools were evaluated and improved. Although several well-known methods for test case generation and best practice coverage measures for classical single-core applications exist, no such methods for multi-core software have established themselves yet. Unfortunately, it is the interaction of concurrent threads that can cause faults that cannot be discovered by testing the individual threads in isolation. As part of the WEMUCS project (more precisely: work package AP3 [http://www.multicore-tools.de/de/test.html]) and based on an industrial size case study, we developed a generic technique (called a "testing pipeline" below) that systematically creates test cases to find and analyze the impact of concurrent side effects. To evaluate the new testing pipeline (including the automated parallelization of sequential code) on real world examples, our project partner Siemens created a complete model of a large luggage conveyor belt, including the code to control the belt. Such a luggage conveyor can be found at every airport. The case study's model is used to automatically derive luggage conveyor belt systems of different sizes, i.e., built from an arbitrary number of feeder or outlet belts. The hardware of the conveyor belt is emulated on the SIMIT simulation tool from Siemens and the control software (written in the programming language AWL) runs on a software-based PLC. The first step of the testing pipeline converts the AWL code into a more comprehensible and human-readable programming language called HLL. We have completed this converter during this reporting period. In step two, our tool then transforms previously sequential parts of the AWL code into HLL units that are executed in parallel. When applied to a luggage conveyor belt system built from eight feeders, eight outlets and an inteconnecting circular belt of straight and curved segments, our tool automatically transcoded 11.704 lines of sequential AWL code into 34 KB of parallel HLL code. In step three another tool developed by the chair then analyzes the HLL code and automatically generates a testing model. This model represents the interprocedural control flow of the concurrent subroutines and also holds all the thread switches that might be relevant for testing. The testing model consists of a set of hierarchically organized UML activities (currently encoded as an XMI document that can be imported into Enterprise Architect by Sparx Systems). When applied to the case study outlined above, our tool automatically generates 103 UML activity diagrams (with 1,302 nodes and 2,581 edges). Step four is optional. The tester can manually adapt the testing model as needed (e.g. by changing priorities or inserting additional verification points). Then the completed model can be loaded into the MBTsuite tool, a model-based testing tool developed by our project partner sepp.med GmbH. This tool is highly configurable to generate test cases that cover as many parts of the testing model as possible. We ran MBTsuite on a standard PC and applied it to the testing model of our case study; within six minutes MBTSuite generated a highly optimized test set consisting of only 10 test cases that nevertheless cover 99% of the nodes and 78% of the edges. In cooperation with our project partner sepp.med GmbH we built two export modules for the above-mentioned MBTsuite. One module outputs the generated test cases as a human-readable spreadsheet, the other module outputs an executable test set in the Java language. The exported spreadsheet contains one individual sheet per test case, with one column per thread. The rows visualize the thread interleaving that gets tested. The Java file holds a Java class with a test method per test case. Every method holds a sequence of test steps that discretely control thread interleavings. This way, each test case execution leads to a unique and reproducible execution of the parallel System Under Test (SUT) written in HLL. Each run of a test instructs our HLL emulator to load and initialize the SUT and each subsequent test step instructs the HLL emulator to execute a certain set of instructions from a certain thread. During this fully controlled execution, each test case emits a detailed protocol of its execution for the final visualization step. A plug-in from sepp.med that creates a visualization layer in Enterprise Architect's UML editor visualizes the resulting log file. Colored nodes and edges tell the user which control flow paths a test has covered. If a test case "fails" (if a race condition or a logic error is found in the tested program), its graphical trace ends at the failing statement. The tester can then follow the control flow back in time in order to understand the underlying reason for the failure. We have implemented a prototype of the full testing pipeline and demonstrated its applicability to an industrial size case study. This tool is a major contribution to testing concurrent code for embedded systems. It is a contribution of the Programming Systems Group to the "ESI initiative" [http://www.esi-anwendungszentrum.de]. |