Energy-efficient software helps improve mobile device experiences and reduce the carbon footprint of data centers. However, energy goals are often de-prioritized in order to meet other requirements. We take inspiration from recent work exploring the use of large language models (LLMs) for different software engineering activities. We propose a novel application of LLMs: as code optimizers for energy efficiency. We describe and evaluate a prototype, finding that over 6 small programs our system can improve energy efficiency in 3 of them, up to 2x better than compiler optimizations alone. From our experience, we identify some of the challenges of energy-efficient LLM code optimization and propose a research agenda.
Background: In this paper, we present our initial efforts to integrate formal methods, with a focus on model-checking specifications written in Temporal Logic of Actions (TLA+), into computer science education, targeting undergraduate juniors/seniors and graduate students. Formal methods can play a key role in ensuring correct behavior of safety-critical systems, yet remain underutilized in educational and industry contexts. Aims: We aim to (1) qualitatively assess the state of formal methods in computer science programs, (2) construct level-appropriate examples that could be included midway into one’s undergraduate studies, (3) demonstrate how to address successive "failures" through progressively stringent safety and liveness requirements, and (4) establish an ongoing framework for assessing interest and relevance among students. Methods: After starting with a refresher on mathematical logic, students specify the rules of simple puzzles in TLA+ and use its included model checker (known as TLC) to find a solution. We gradually escalate to more complex, dynamic, event-driven systems, such as the control logic of a microwave oven, where students will study safety and liveness requirements. We subsequently discuss explicit concurrency, along with thread safety and deadlock avoidance, by modeling bounded counters and buffers. Results: Our initial findings suggest that through careful curricular design and choice of examples and tools, it is possible to inspire and cultivate a new generation of software engineers proficient in formal methods. Conclusions: Our initial efforts suggest that 84% of our students had a positive experience in our formal methods course. Future plans include a longitudinal analysis within our own institution and proposals to partner with other institutions to explore the effectiveness of our open-source and open-access modules.
Our hands-on course introduces model checking using Temporal Logic of Actions through practical examples like the control logic for a microwave oven. Due to highly positive initial feedback from our own students, we plan to broaden our reach by partnering with other institutions.
One of the key goals in high-performance and distributed software engineering is to leverage the specific capabilities of the target hardware to the extent possible. Today’s systems are typically heterogeneous, where one or more architectures may be present within a single system, such as conventional CPU cores combined with accelerators such as GPUs and FPGAs. Although parallel computing itself has reached a high level of maturity, as we move toward exascale and beyond computing, challenges similar to those that plagued the earliest days of parallel and distributed computing are beginning to resurface: Leveraging heterogeneity while balancing performance, software portability, and developer productivity (P3). This tutorial provides hands-on experience in developing high-performance and embedded software for heterogeneous architectures using Intel’s oneAPI reference implementation of the Khronos SYCL standard in conjunction with state-of-the-art software engineering methods. By raising the abstraction level via its unified application programming interface (API), oneAPI makes it easier to develop portable high-performance software for systems with embedded hardware accelerators, such as GPUs and FPGAs.
New technologies as decision aids for the advancement of ecological risk assessment
Federico Sinche Chele, Priscilla Jimenez‐Pazmino, and Konstantin Läufer
Integrated Environmental Assessment and Management, Aug 2023
Moore’s law states that the number of transistors that can be placed on an integrated circuit doubles every two years (Moore, 1975). This has led to a steady increase in the processing power of computers over time, and technology is now enhancing and advancing software and scientific applications, which has enabled computationally intensive methods such as machine learning, data science, modeling, and simulation. The advancement of computers and data-driven algorithms is profoundly impacting people’s lives. It is changing the way we work, the way we learn, and the way we interact with the world around us. Here, this editorial will discuss how scientists can benefit from the latest technology advancements and related tools by incorporating them into the ecological risk assessment (ERA) to study ecosystems as a way to create refined assessments and accelerate the turnaround times.
Abstract—oneAPI is a major initiative by Intel aimed at making it easier to program heterogeneous architectures used in high-performance computing using a unified application programming interface (API). While raising the abstraction level via a unified API represents a promising step for the current generation of students and practitioners to embrace highperformance computing, we argue that a curriculum of welldeveloped software engineering methods and well-crafted exemplars will be necessary to ensure interest by this audience and those who teach them. We aim to bridge the gap by developing a curriculum—codenamed UnoAPI—that takes a more holistic approach by looking beyond language and framework to include the broader development ecosystem, similar to the experience found in popular HPC languages such as Python. We hope to make parallel programming a more attractive option by making it look more like general application development in modern languages being used by most students and educators today. Our curriculum emanates from the perspective of well-crafted exemplars from the foundations of computer systems—given that most HPC architectures of interest begin from the systems tradition—with an integrated treatment of essential principles of distributed systems, programming languages, and software engineering. We argue that a curriculum should cover the essence of these topics to attract students to HPC and enable them to confidently solve computational problems using oneAPI. By the time of this submission, we have shared our materials with a small group of undergraduate sophomores, and their responses have been encouraging in terms of self-reported comprehension and ability to reproduce the compilation and execution of exemplars on their personal systems. We plan a follow-up study with a larger cohort by incorporating some of our materials in our existing course on High-Performance Computing.
Software metrics capture information about software development products and processes. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time—longitudinal metrics that give insight about process, not just product. In this work, we present PRIME (PRocess Internal MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor. We illustrate the value of longitudinal data and conclude with a research agenda.
LUC
DriveAware: Generating Actionable Data through Vehicle-Based Citizen Science (Poster)
Álvaro Landaluce, Federico Sinche Chele, Loretta Stalans, and 2 more authors
As we’re driving around Chicago going about our weekly business, we can’t help but notice telltale signs of social problems in our neighborhoods, such as homelessness, drug use, street prostitution, and suspected human trafficking. Some neighborhoods are frozen in time, often with little evidence of new development and access to opportunities that are clustered around the north side of Chicago. It may not be comfortable for us to intervene directly or scientifically sound to draw conclusions based on our isolated observations. Nevertheless, we may have an opportunity to contribute as citizen scientists by recording our observations and sharing them with better-equipped stakeholders. The interdisciplinary DriveAware project aims to use mobile computing technology to make it safe and convenient for citizen scientists in their vehicles or on foot to augment existing observations, e.g. Google Street View, such as geotagged annotated images, to various stakeholders, such as social scientists, law enforcement, social service providers, and ordinary fellow citizens. The collected data can be used in various ways, including offline data analysis and real-time maps and dashboards, as well as concrete action, such as social scientists going on location to interview the affected individuals, law enforcement investigating and prosecuting potential incidents, and social service providers intervening directly, e.g., by picking up homeless persons during a cold spell and taking them to shelters. In addition to the interdisciplinary stakeholders’ existing research questions, this project gives rise to various computer science research questions, such as system requirements and architecture, system implementation, integration with existing databases in particular fields, user experience studies to determine the likelihood of reporting observations through in-car voice-based and/or reduced distraction touch interfaces (Android Auto/Apple CarPlay) versus conventional mobile apps. To support our research questions, we have developed a proof of concept of a mobile application for reporting observations in various pertinent categories, which get stored in a cloud-hosted Firebase (NoSQL) instance and can be accessed through a Python Jupyter notebook and analyzed/visualized in multidimensional ways. Future plans include integration with commercial automotive frameworks, such as Android Auto (and eventually Apple CarPlay) and automatic image collection from dashcams connected to the mobile phone via Bluetooth. The diagram below illustrates the high-level system architecture.
2021
LUC
Metrics Pipeline (Codename): An Analytics and Visualization Pipeline for Software Quality Metrics (Poster)
Nicholas Synovic, Emmanuel Amobi, Erik Greve, and 8 more authors
In Undergraduate Research and Engagement Symposium, Oct 2021
The Metrics Pipeline (Codename) focuses on metrics indicative of team progress and project health instead of privileging individual metrics, e.g. number of commits, etc. The Metrics Dashboard allows the user to submit the URL of a hosted repository for batch analysis, whose results are then cached. Upon completion, the user can interactively study various metrics over time (at varying granularity), numerically, and visually. The initial version of the system is up and running as a public cloud service (SaaS) and supports project size (KLOC), defect density, defect spoilage, and productivity. While our system is by no means the first to support software metrics, we believe it may be one of the first community-focused extensible resources that can be used by any hosted project.
Trust in open-source software is a cornerstone of scientific progress and a foundation of high-quality public services. Just as standards are integral when judging the efficacy of a novel pharmaceutical compound or determining the spread of a new disease, the software used to make those determinations should be useful, error-free, reliable, performant, and secure. A small bug in an application, library, or framework can lead to economic loss and even loss of life. We rely on software developers to be dynamic and responsive to user review and bug-reporting. Our team developed an open-source modular pipeline to perform empirical investigations of software quality. A key innovation of our approach is to look at projects “from a distance” similar to methods used in climate, e.g. satellite images being used to observe environmental impacts in air quality/rain forests. Instead of looking at language-specific source code features, our pipeline uses a language-agnostic high-level approach to track software quality by focusing on the development process itself, which yields great insight into the processes programmers use to write and maintain their software. Our distributed modular approach to analytics allows the pipeline to be easily extended to support additional metrics in future work. We store extracted data in an embedded SQLite database, which means that analysis can proceed without complex server setup, let alone hosting the software on dedicated servers. Our analytical modules are designed for efficiency, and future runs of our software only collect missing data, supporting the incremental analysis of known, important open-source projects.
In testing stateful abstractions, it is often necessary to record interactions, such as method invocations, and express assertions over these interactions. Following the Test Spy design pattern, we can reify such interactions programmatically through additional mutable state. Alternatively, a mocking framework, such as Mockito, can automatically generate test spies that allow us to record the interactions and express our expectations in a declarative domain-specific language. According to our study of the test code for Scala’s Iterator trait, the latter approach can lead to a significant reduction of test code complexity in terms of metrics such as code size (in some cases over 70% smaller), cyclomatic complexity, and amount of additional mutable state required. In this tools paper, we argue that the resulting test code is not only more maintainable, readable, and intentional, but also a better stylistic match for the Scala community than manually implemented, explicitly stateful test spies.
In this chapter, we explore various parallel and distributed computing topics from a user-centric software engineering perspective. Specifically, in the context of mobile application development, we study the basic building blocks of interactive applications in the form of events, timers, and asynchronous activities, along with related software modeling, architecture, and design topics.
Metrics Dashboard: A Hosted Platform for Software Quality Metrics
George K. Thiruvathukal, Shilpika, Nicholas J. Hayward, and 1 more author
There is an emerging consensus in the scientific software community that progress in scientific research is dependent on the "quality and accessibility of software at all levels" (this http URL). This progress depends on embracing the best traditional—and emergent—practices in software engineering, especially agile practices that intersect with the more formal tradition of software engineering. As a first step in our larger exploratory project to study in-process quality metrics for software development projects in Computational Science and Engineering (CSE), we have developed the Metrics Dashboard, a platform for producing and observing metrics by mining open-source software repositories on GitHub. Unlike GitHub and similar systems that provide individual performance metrics (e.g. commits), the Metrics Dashboard focuses on metrics indicative of team progress and project health. The Metrics Dashboard allows the user to submit the URL of a hosted repository for batch analysis, whose results are then cached. Upon completion, the user can interactively study various metrics over time (at varying granularity), numerically and visually. The initial version of the system is up and running as a public cloud service (SaaS) and supports project size (KLOC), defect density, defect spoilage, and productivity. While our system is by no means the first to support software metrics, we believe it may be one of the first community-focused extensible resources that can be used by any hosted project.
2017
Experiences with Scala Across the College-Level Curriculum (Invited Talk)
Mark Lewis, Konstantin Läufer, and George K. Thiruvathukal
As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest.
Academic courses focused on individual microcomputers or client/server applications are no longer sufficient for students to develop knowledge in embedded systems. Current and near-term industrial systems employ multiple interacting components and new network and security approaches; hence, academic preparation requires teaching students to develop realistic projects comparable to these real-world products. However, the complexity, breadth, and technical variations of these real-world products are difficult to reproduce in the classroom. This paper outlines preliminary work on a framework architecture suitable for academic teaching of modern embedded systems including the Internet of Things. It defines four layers, two of which are at the edges of the network, and not adequately covered in academia. For each layer of the architecture, specific technology and suitable devices are identified. Desired academic outcomes for courses using projects based on the architecture are identified. Feedback and comparison is sought on how effective student course and research activities based on the framework will be to real-world embedded systems developers.
Scala is one of a new breed of hybrid languages with both object-oriented and functional aspects. It happens to be the most successful of these languages coming in at #12 on the Red Monk language ranking and leading all languages in their 2nd tier. This workshop will introduce participants to the Scala programming language, how it can be used effectively in introductory CS courses, and the parallel tools that are available for it. We begin with simple examples in the REPL and scripting environment, then look at doing larger, object-oriented projects. We finish off with an exploration of composable futures and the Akka actor library. Participants are strongly recommended to bring a laptop.
Various hybrid-paradigm languages, designed to balance compile-time error detection, conciseness, and performance, have emerged. Scala, e.g., is interoperable with Java and has become an early leader in adoption, especially in the start-up and open-source spaces. Workshop participants experience Scala’s value as a teaching language in the CS curriculum through four lecture-lab modules: In CS1, the read-eval-print loop and simple, uniform syntax aid programming in the small. In CS2, higher-order methods allow concise, efficient manipulation of collections. Advanced topics include domain-specific languages, concurrency, web apps/services, and mobile apps. Laptop recommended with Scala installed.
Early Adopter Report: PDC Modules for Every Level: A Comprehensive Model for Incorporating PDC Topics into the Existing Undergraduate Curriculum (Poster)
Konstantin Läufer, Chandra Sekharan, George K. Thiruvathukal, and 1 more author
In 2nd NSF/IEEE-CS TCPP Workshop on Parallel and Distributed Computing Education (EduPar), Shanghai, China, May 2012
We have designed and implemented RestFS, a software framework that provides a uniform, configurable connector layer for mapping remote web-based resources to local filesystem-based resources, recognizing the similarity between these two types of resources. Such mappings enable programmatic access to a resource, as well as composition of two or more resources, through the local operating system’s standard filesystem application programming interface (API), scriptable file-based command-line utilities, and inter-process communication (IPC) mechanisms. The framework supports automatic and manual authentication. We include several examples intended to show the utility and practicality of our framework.
Initial experience in moving key academic department functions to social networking sites
David B. Dennis, George K. Thiruvathukal, and Konstantin Läufer
In ICSOFT 2011 - Proceedings of the 6th International Conference on Software and Data Technologies, Volume 1, Seville, Spain, 18-21 July, 2011, May 2011
REST on Routers? Preliminary Lessons for Language Designers, Framework Architects, and App Developers
Joseph P. Kaylor, Konstantin Läufer, and George K. Thiruvathukal
In ICSOFT 2011 - Proceedings of the 6th International Conference on Software and Data Technologies, Volume 1, Seville, Spain, 18-21 July, 2011, May 2011
We present a novel form of intra-volume directory layering with hierarchical, inheritance-like namespace unification. While each layer of an OLFS volume constitutes a subvolume that can be mounted separately in a fan-in configuration, the entire hierarchy is always accessible (online) and fully navigable through any mounted layer. OLFS uses a relational database to store its layering metadata and either a relational database or any (virtual) host file system as its backing store, along with metadata and block caching for improved performance. Because OLFS runs as a virtual file system in user-space, its capabilities are available to all existing software without modification or special privileges. We have developed a reference implementation of OLFS for FUSE based on MySQL and XFS, and conducted performance benchmarking against XFS by itself. We explore several applications of OLFS, such as enhanced server synchronization, transactional file operations, and versioning.
Processing markup in object-oriented languages often requires the programmer to focus on the objects generating the markup rather than the more pertinent domain objects. The BetterXML framework aims to improve this situation by allowing the programmer to develop a domain-specific object model as usual and later bind this model to preexisting or newly generated markup. To this end, the framework provides two types of object trees, XElement and NaturalXML, for representing XML documents. XElement goes beyond DOM-like automatic parsing of XML by supporting the custom mapping of elements to domain objects; NaturalXML allows the mapping of existing domain objects to XML elements using class metadata. Both types of object trees can be inflated and deflated by means of a common intermediate representation in the form of an event stream. Finally, the framework includes the XML Intermediate Representation (XIR), a lossless record-oriented representation of XML documents for efficient streaming and other types of data exchange.
Combining SOA and BPM Technologies for Cross-System Process Automation
Sebastian Herr, Konstantin Läufer, John Shafaee, and 2 more authors
In Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering (SEKE’2008), San Francisco, CA, USA, July 1-3, 2008, May 2008
By incorporating automated component, integration, and acceptance testing into the various tiers of a lightweight lava 2 Enterprise Edition (J2EE) Web application architecture, developers can shorten the development cycle and increase the quality of their work
In Conference Record of POPL’96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Papers Presented at the Symposium, St. Petersburg Beach, Florida, USA, January 21-24, 1996, Mar 1996
Design patterns are distilled from many real systems to catalog common programming practice. However, some object-oriented design patterns are distorted or overly complicated because of the lack of supporting programming language constructs or mechanisms. For this paper, we have analyzed several published design patterns looking for idiomatic ways of working around constraints of the implementation language. From this analysis, we lay a groundwork of general-purpose language constructs and mechanisms that, if provided by a statically typed, object-oriented language, would better support the implementation of design patterns and, transitively, benefit the construction of many real systems. In particular, our catalog of language constructs includes subtyping separate from inheritance, lexically scoped closure objects independent of classes, and multimethod dispatch. The proposed constructs and mechanisms are not radically new, but rather are adopted from a variety of languages and programming language research and combined in a new, orthogonal manner. We argue that by describing design patterns in terms of the proposed constructs and mechanisms, pattern descriptions become simpler and, therefore, accessible to a larger number of language communities. Constructs and mechanisms lacking in a particular language can be implemented using paradigmatic idioms.
Many statically-typed programming languages provide an abstract data type construct, such as the package in Ada, the cluster in CLU, and the module in Modula2. However, in most of these languages, instances of abstract data types are not first-class values. Thus they cannot be assigned to a variable, passed as a function parameter, or returned as a function result. The higher-order functional language ML has a strong and static type system with parametric polymorphism. In addition, ML provides type reconstruction and consequently does not require type declarations for identifiers. Although the ML module system supports abstract data types, their instances cannot be used as first-class values for type-theoretic reasons. In this dissertation, we describe a family of extensions of ML. While retaining ML’s static type discipline, type reconstruction, and most of its syntax, we add significant expressive power to the language by incorporating first-class abstract types as an extension of ML’s free algebraic datatypes. In particular, we are now able to express multiple implementations of a given abstract type; heterogeneous aggregates of different implementations of the same abstract type; and dynamic dispatching of operations with respect to the implementation type. Following Mitchell and Plotkin, we formalize abstract types in terms of existentially quantified types. We prove that our type system is semantically sound with respect to a standard denotational semantics. We then present an extension of Haskell, a non-strict functional language that uses type classes to capture systematic overloading. This language results from incorporating existentially quantified types into Haskell and gives us first-class abstract types with type classes as their interfaces. We can now express heterogeneous structures over type classes. The language is statically typed and offers comparable flexibility to object-oriented languages. Its semantics is defined through a type-preserving translation to a modified version of our ML extension. We have implemented a prototype of an interpreter for our language, including the type reconstruction algorithm, in Standard ML.
In Declarative Programming, Sasbachwalden 1991, PHOENIX Seminar and Workshop on Declarative Programming, Sasbachwalden, Black Forest, Germany, 18-22 November 1991, Jul 1991