GitHub


Overview

  • GitHub is a collection of data repositories that can glean intelligence related to customized source code, exploits, and vulnerabilities specific to the payment industry.
  • The data is accumulated via user submission of files and code for versioning control, storage, and retrieval.
  • There are hundreds of thousands of GitHub repositories available for searching and cloning.
  • GitHub is a vital data source to gather threat intelligence on customized software exploitations and vulnerabilities related to the payment industry. It is also a highly useful tool for more holistic OSINT data.

Collection Strategy:

  • The GitHub data was collected using a customized API and Python Script.
  • The application Postman was used to generate a client API call and dump the response into a JSON format:
    GitHub collection through Postman
  • This approach was taken to test the API call and determine if the data received was relevant before creating an automated method of data collection.
  • A customized Python script was created to automate the API call and convert the JSON response into a CSV file for analysis:
    GitHub Python Script
  • The script was created so that the API could be called programmatically each day to collect the data set and store into an AWS S3 bucket.

Summary Statistics:

  • Records Collected: Over 1,000 records per data set
  • Coverage Dates: February 2013 - April 2022

Sample Data:

  • Link