URL Classification


Preliminary Visualizations


Approach

  • From the intelligence gathered in Slides 69 - 76, it was observed that adding in additional characteristics, such as the number of symbols and number of digits, can help with classifying malicious URLs.
  • To further visualize this analysis, a scatterplot was created with the following characteristics:
    • X Axis = Number of Symbols
    • Y Axis = Number of Digits
    • Red Dot (x) = Not Malicious
    • Blue Dot (x) = Is Malicious

Initial Observations

  • From the initial visualization, it can be observed that malicious URLs are associated with larger amounts of numbers and/or symbols (within this dataset).
  • All non-malicious URLs can be seen near the bottom-left corner, indicating a minimal presence of numbers and symbols in their URLs.
  • Graphing this dataset helped visualize some potential discrepancies/dissimilarities in the dataset that will be further discussed in Milestone 4.
Scatterplot of URL Classification Results