ScanCode Toolkit

ScanCode is a tool to scan code and detect licenses, copyrights and more.

Why ScanCode?

Discovering the origin and license for a software component is important, but it is often much harder to accomplish than it should be because:

  • A typical software project may reuse tens or hundreds of third-party software components
  • Software authors do not always provide copyright and license information
  • Copyright and license information that is provided may be hard to find and interpret

ScanCode tries to address this issue by offering:

  • A comprehensive code scanner that can detect origin or license information inside codebase files
  • A simple command line approach that runs on Windows, Linux, and Mac
  • Your choice of JSON or other output formats (SPDX, HTML, CSV) for integration with other tools
  • ScanCode workbench for Visualization
  • Well-tested, easy to hack, and well-documented code
  • Release of the code and reference data under attribution licenses (Apache 2.0 and CC-BY-1.0)
  • Plugin System for easily adding new Functionality to Scans.
  • Python 3 Unicode Capabilities for better supporting users from 100+ languages.
  • Extensive Documentation Support.

What does ScanCode Toolkit do?

ScanCode finds the origin history information that is in your codebase with a focus on:

  • Copyright and other origin clues (emails, urls, authors etc)
  • License notices and license text with reference information about detected licenses.

Using this data you can:

  • Discover the origin and license of the open source and third-party software components that you use,
  • Create a software component Inventory for your codebase, and
  • Use this data to comply with open source license obligations such as attribution and redistribution.

How does it work?

Given a codebase in a directory, ScanCode will:

  • Collect an inventory of the code files and classify the code using file types
  • Extract files from any archive using a general purpose extractor
  • Extract texts from binary files if needed
  • Use an extensible rules engine to detect open source license text and notices
  • Use a specialized parser to capture copyright statements
  • Identify packaged code and collect metadata from packages
  • Report the results in the formats of your choice (JSON, SPDX, etc.) for integration with other tools
  • Browse the results using the ScanCode Workbench companion app to assist your analysis.

ScanCode should enable you to identify the “easy” cases on your own, but a software development team will probably need to build internal expertise or use outside experts (like nexB) in many cases.

ScanCode is written in Python and also uses other open source packages.