PhishTank


Overview

  • PhishTank is an open-source site (operated by Cisco) that allows individuals to publish, upload, and verify malicious URLs.
  • With phishing being a top cyberthreat to the financial industry, this data will provide significant value and insight for identifying malicious URLs.
  • Monitoring reports and looking for similar URLs can help companies alert consumers of potential phishing attempts.
  • An influx of phishing reports for malicious URLs mimicking a company’s site can indicate the overall business/industry may be a susceptible target of an attack.
  • PhishTank data is publicly available through a link to download an ongoing CSV of their data.

Collection Strategy:

  • The PhishTank data was collected using a provided endpoint and a combination of AWS services
  • PhishTank’s data is updated every hour, so an automated retrieval and storage process has been set up to ensure the analyses reflect near real-time.
  • A Lambda Function has been setup with a Python script to retrieve PhishTank’s CSV and upload it to an S3 Bucket.
    Lambda Function of a Python Script to interact with PhishTank
  • A Rule has been created in Amazon EventBridge to automate the script’s execution every 30 minutes (if the initial retrieval fails, try again in 30 minutes - PhishTank has rate limiting).
    EventBridge Rule to retrieve from PhishTank
  • The S3 Bucket stores the new version every hour (previous versions are stored as backups).
    S3 Bucket containing PhishTank data

Summary Statistics:

  • Records Collected: ~7,000
  • Coverage Dates: February 2011 to April 2022

Sample Data:

  • Link