Software composition analysis explained and how it identifies the risks of open source software
Definition of software composition analysis
Software Composition Analysis (SCA) refers to getting an overview of the open source components and dependencies used in your application, and how, all in an automated fashion. This process is used to assess the security of these components and any potential risk or license conflict that may arise from them. The correct integration of SCA tools into your software development workflow is an important step towards strengthening the security and integrity of the software supply chain by ensuring that no borrowed code introduces security risks or risks. legal compliance issues in your products.
Why software composition analysis is necessary
Gone are the days when software applications were built from scratch. The massive adoption of open source software has revolutionized application development. Independent developers and businesses can use existing components and libraries in their code to implement functionality ranging from simple web form validations to complex cryptographic operations.
While re-using open source code has largely eliminated the need to reinvent the wheel, it comes with a few caveats: What if the code you’re borrowing has bugs or security holes? Also, what if the license terms carried by the open source component conflict with your application’s license? Who should review all this?
Examining a dozen components can be a simple task to do manually, but modern software applications are built using hundreds of libraries. These libraries may themselves have other dependencies. This process can run many layers deep, and before you know it, your application that otherwise appears to have only a handful of libraries, can pull hundreds or thousands of transitive dependencies. This is where SCA comes to the rescue.
Analysis of software composition and SBOM
Most SCA tools can generate a software bill of materials (SBOM). An SBOM is a detailed account of the inventory, all of the dependencies and components that make up your application. An ideal SBOM provides the component name, version number, release date, checksum, license information among other metadata for each component present in your application.
This can be done in one of two ways:
- Binary analysis: The SCA tool analyzes your build artifacts and identifies open source components through a binary fingerprint. This process identifies all of the packages included in the final version of your app, which reduces false positives and captures third-party software and libraries added to your app in non-standard ways. Not all SCA tools have binary analysis capabilities.
- Manifest and binary analysis: Some SCA solutions may take a hybrid approach: parsing both manifests and binaries to get very accurate SBOMs. Therefore, the sophistication of your SCA solution determines how accurately it can identify all the hidden components in your application.
Typically, SBOMs are provided as text files in XML, JSON or similar format that make them readable by humans and machines. Below is an example of SBOM for the Keycloak application, version 10.0.2. The XML document is based on the OWASP CycloneDX standard and lists the components that make up Keycloak, including their checksums, version number, release date, and license information. It should be noted that a single version of Keycloak contains more than 900 components, according to the SBOM:
The Linux Foundation SPDX format, although still text-based, differs from the CycloneDX standard. An example is shown below.
How do SCA tools help detect open source vulnerabilities?
Automated SCA tools can help software teams create and deliver high-quality code and give stakeholders a proactive approach to risk management. By identifying vulnerabilities and security risks early in the software development process, SCA tools can enable software developers to select more secure components up front in a transparent manner. This advantage speeds up the development process by minimizing the need for repeated security assessments, as sufficient care is taken early on when including third-party components and libraries in an application.
If a component with known risks and vulnerabilities is absolutely necessary, development teams can make judgment when first introducing the component and consider adopting potential workarounds to use the component safely.
The goal of the SCA process and tools goes beyond simply analyzing your application’s sources and binaries to produce an SBOM. The main challenge is to accurately map each version of the component to known vulnerabilities. Next comes the compliance aspect: let stakeholders transparently review and resolve any licensing conflicts posed by components.
Maybe a few years ago the process was straightforward. It would have been enough to go through the CVE streams provided by MITER or NVD and map them to the versions of the components present in your application. Research including an article produced by the University of Central Florida, George Mason and Georgia Tech has shown that CVE advisories can often be inaccurate and contain inconsistencies. Other times, CVE data may be misinterpreted due to the way Common Platform Enumeration (CPE) data is presented in these notices.
For example, a CVE advisory issued for a vulnerability in the Tomcat server might apply only to a component selected under the Apache Tomcat namespace, such as org.apache.tomcat: coyote rather than the entire ‘Apache Tomcat namespace, but that may not be clear on its own. of the CPEs mentioned in the notice.
SCA tools therefore need to be smart enough to accurately map security vulnerabilities to impacted components, rather than blindly trusting CVE advisories and flagging harmless components. To minimize friction for developers while putting security assessment and compliance teams in peace, SCA solutions must minimize the occurrence of false positive vulnerabilities in their results, but not at the risk of introducing false negatives. (i.e. missing security risks). This may warrant human intervention, security research, and signature-based file scanning tools.
Additionally, relying solely on CVE feeds for security information is not sufficient. Vulnerability notices can appear on product vendor websites, GitHub, and many other places, including private databases. Likewise, proof-of-concept exploits for zero-day or known vulnerabilities can appear on Exploit-DB, hacker forums, and other mysterious places. Not all SCA tools are created equal and must have sufficient capacity to extract information from a plethora of sources and make sense of thousands of those entries.
New supply chain threats: malware, pirated libraries, confusing dependencies
When selecting SCA tools for your organization, another challenge is dealing with new attacks, and not just known security risks and vulnerabilities.
As if staying ahead of zero days wasn’t already an issue, we are now seeing increased incidences of typosquatting attacks and dependency confusion malware infiltrating open source registries like npm, PyPI and RubyGems, and these keep evolving.
As a senior security researcher, I analyzed hundreds of malware samples and dependency confusion packages infiltrating the open source ecosystem. October 2021 marked the first time we saw working ransomware code included in a cleverly named typosquat: noblox.js-proxies. The legitimate package is called noblox.js-proxied and is a mirror of the official Noblox.js package, a Roblox game API wrapper.
In the same month, threat actors also hijacked the hugely popular npm libraries ua-parser-js, coa, and rc themselves to install cryptominers and password stealers. The UA Parser library is downloaded over 7 million times per week and is used by Facebook, Microsoft, Amazon, Google, among other tech companies, demonstrating the potential impact that could have resulted from a hijack like this. Likewise, coa records around 9 million weekly downloads and around 14 million downloads.
Rather than a typosquatting or dependency hijacking attack, this supply chain incident involved threat actors compromising the npm accounts of those primarily responsible for these projects. JetBrains revealed a potential impact on Kotlin / JS developers who had run Karma test cases during the Compromise Window, as ua-parser-js was one of the dependencies of the Karma test framework.
All of this begs the question: Are your SCA tools capable of detecting malware injections, malicious typosquats, dependency hijacking, and compromised libraries before they are distributed downstream?
Identifying the thousands of components that make up your application is itself a daunting task for an automated tool, let alone a team of human developers. Next comes the task of sifting through security feeds listing thousands of vulnerabilities that may or may not apply to your application. Finally, the ever-changing threat landscape has further complicated the security and integrity issues of the software supply chain. Integrating a complete, fast, and accurate SCA solution into your software development workflow has become essential, but acquiring one that addresses most if not all of the aforementioned new threats remains a challenge.
Copyright Â© 2021 IDG Communications, Inc.