PhishTank
Overview
- PhishTank is an open-source site (operated by Cisco) that allows individuals to publish, upload, and verify malicious URLs.
- With phishing being a top cyberthreat to the financial industry, this data will provide significant value and insight for identifying malicious URLs.
- Monitoring reports and looking for similar URLs can help companies alert consumers of potential phishing attempts.
- An influx of phishing reports for malicious URLs mimicking a company’s site can indicate the overall business/industry may be a susceptible target of an attack.
- PhishTank data is publicly available through a link to download an ongoing CSV of their data.
Collection Strategy:
- The PhishTank data was collected using a provided endpoint and a combination of AWS services
- PhishTank’s data is updated every hour, so an automated retrieval and storage process has been set up to ensure the analyses reflect near real-time.
-
A Lambda Function has been setup with a Python script to retrieve PhishTank’s CSV and upload it to an S3 Bucket.
-
A Rule has been created in Amazon EventBridge to automate the script’s execution every 30 minutes (if the initial retrieval fails, try again in 30 minutes - PhishTank has rate limiting).
-
The S3 Bucket stores the new version every hour (previous versions are stored as backups).
Summary Statistics:
- Records Collected: ~7,000
- Coverage Dates: February 2011 to April 2022