four new posts

hauten · hauten · commit 65c0c241c3d0 · 2025-01-31T11:42:27.000-06:00
diff --git a/_posts/2025-01-08-compilergpt-new.md b/_posts/2025-01-08-compilergpt-new.md
@@ -0,0 +1,6 @@
+---
+title: "New Repo: CompilerGPT"
+categories: new-repo
+---
+
+[CompilerGPT](https://github.com/LLNL/CompilerGPT) is a framework that submits compiler optimization reports (i.e., Clang) and the source code to an LLM. The LLM is prompted to prioritize the findings in the optimization reports and then to make changes in the code accordingly. An automated test harness validates the changes. The test harness provides feedback to the LLM on any errors that were introduced to the code base.
diff --git a/_posts/2025-01-10-protlib-new.md b/_posts/2025-01-10-protlib-new.md
@@ -0,0 +1,8 @@
+---
+title: "New Repo: protlib-designer"
+categories: new-repo
+---
+
+[protlib-designer](https://github.com/LLNL/protlib-designer) contains a lightweight Python library for designing diverse protein libraries by seeding linear programming with deep mutational scanning data (or any other data that can be represented as a matrix of scores per single-point mutation). The software takes as input the score matrix, where each row corresponds to a mutation and each column corresponds to a different source of scores, and outputs a subset of mutations that maximize the diversity of the library while Pareto-optimizing the scores from the different sources. Related paper: [Antibody Library Design by Seeding Linear Programming with Inverse Folding and Protein Language Models](https://www.biorxiv.org/content/10.1101/2024.11.03.621763v1). Abstract:
+
+> We propose a novel approach for antibody library design that combines deep learning and multi-objective linear programming with diversity constraints. Our method leverages recent advances in sequence and structure-based deep learning for protein engineering to predict the effects of mutations on antibody properties. These predictions are then used to seed a cascade of constrained integer linear programming problems, the solutions of which yield a diverse and high-performing antibody library. Operating in a cold-start setting, our approach creates designs without iterative feedback from wet laboratory experiments or computational simulations. We demonstrate the effectiveness of our method by designing antibody libraries for Trastuzumab in complex with the HER2 receptor, showing that it outperforms existing techniques in overall quality and diversity of the generated libraries.
diff --git a/_posts/2025-01-27-pylulesh-new.md b/_posts/2025-01-27-pylulesh-new.md
@@ -0,0 +1,6 @@
+---
+title: "New Repo: pylulesh"
+categories: new-repo
+---
+
+[pylulesh](https://github.com/LLNL/pylulesh), which stands for the Python Port of the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics, is a port of [LULESH](https://github.com/LLNL/LULESH) 2.0 using Python and NumPy.
diff --git a/_posts/2025-01-31-bobgat.md b/_posts/2025-01-31-bobgat.md
@@ -0,0 +1,6 @@
+---
+title: "ML-Driven Binary Analysis Pipeline Enhances SQA"
+categories: story
+---
+
+Machine learning (ML) techniques—such as graph neural networks (GNNs) and natural language processing (NLP)—are opening up new avenues to automating binary analysis. Leveraging these techniques, computational mathematician Geoff Sanders and former LLNL data scientist Justin Allen explored ways to characterize software behaviors based on similarity to previous threats. Allen built an ML-driven binary analysis pipeline that incorporates large-scale training data and hierarchical embeddings, and presented their paper, [BobGAT: Towards Inferring Software Bill of Behavior with Pre-Trained Graph Attention Networks](https://www.osti.gov/servlets/purl/2475272), at the 2024 IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications. The work was part of a Laboratory Directed Research and Development project focusing on software assurance capabilities. Two complementary open-source tools are key to this pipeline. Developed for this research, [CAP (Compile. Analyze. Prepare.)](https://github.com/LLNL/CAP) generates large-scale binary datasets from source code examples, then [BinCFG](https://github.com/LLNL/BinCFG) parses compiler outputs, tokenizes and normalizes the binary data into assembly lines, and converts the data into ML-prepped formats. [Read more about the project at LLNL Computing.](https://computing.llnl.gov/about/newsroom/ml-driven-binary-analysis-pipeline)