Summaries:


Shodan

  • Collection Strategy:
    • Accessed Shodan exploit database via API
    • Python script created to parse data and store as CSV file
    • Script to run daily and post data set to AWS S3 bucket
  • Amount Collected:
    • Over 1,500 records collected per data set
  • Coverage Dates:
    • January 2012 to Current

National Vulnerability Database

  • Collection Strategy:
    • Accessed NVD data via API
    • Python script created to parse data and store as CSV file
    • Script to run daily and post data set to AWS S3 bucket
  • Amount Collected:
    • Approximately 183,300 records collected
  • Coverage Dates:
    • October 1988 to March 2022

GitHub

  • Collection Strategy:
    • Accessed Github data repositories via API
    • Python script used to retrieve and parse data
    • AWS Services to automate retrieval and storage
  • Amount Collected:
    • Over 1,000 records collected per data set
  • Coverage Dates:
    • February 2013 to April 2022

PhishTank

  • Collection Strategy:
    • Accessed PhishTank data through a provided URL
    • Python script used to retrieve data
    • AWS Services to automate retrieval and storage
  • Amount Collected:
    • Approximately 7,000 records collected
  • Coverage Dates:
    • February 2011 to April 2022

Dashboard:


This table is a live dashboard representing the total number of records currently available in each data source. Records are retrieved from each data source programmatically, added to the document storage site located on AWS, and then summary statistics are written to a JSON file.

Only two data sources have been completed at the time of submission, but others will be added.