GitHub
Overview
- GitHub is a collection of data repositories that can glean intelligence related to customized source code, exploits, and vulnerabilities specific to the payment industry.
- The data is accumulated via user submission of files and code for versioning control, storage, and retrieval.
- There are hundreds of thousands of GitHub repositories available for searching and cloning.
- GitHub is a vital data source to gather threat intelligence on customized software exploitations and vulnerabilities related to the payment industry. It is also a highly useful tool for more holistic OSINT data.
Collection Strategy:
- The GitHub data was collected using a customized API and Python Script.
-
The application Postman was used to generate a client API call and dump the response into a JSON format:
- This approach was taken to test the API call and determine if the data received was relevant before creating an automated method of data collection.
-
A customized Python script was created to automate the API call and convert the JSON response into a CSV file for analysis:
- The script was created so that the API could be called programmatically each day to collect the data set and store into an AWS S3 bucket.
Summary Statistics:
- Records Collected: Over 1,000 records per data set
- Coverage Dates: February 2013 - April 2022