Analyses : Text Mining

Text Mining

Task: Gather data regarding the most vulnerabilities in Payment assets (e.g. word count).
Value: Can determine the most vulnerable assets by the most critical threats.

Companies can use this intelligence to flag URLs as risky.
While it is ultimately up to the user to determine if they click a link, being able to provide a potential warning is extremely valuable.

Potential Value: Identifying specific features from the URLs (ex: number of symbols/digits/etc.) to see if they can assist with accuracy of predictions.

Step 1 : Gathered NVD data source CSV file from AWS data repository
Step 2 : Copied text from description field into text file for all 2022 records (Approx. 3,500)
Step 3 : Preprocessed the data - Stop Words and Stemming by determining

Removed irrelevant words “the”, “a”, “of”, etc.
Reduced words down to their root word “pay = payment, payments,” “bank = banks”, “
Step 4 : Used “databasic.io” to find the most commonly used words to describe the vulnerabilities and create visualizations
Next Steps : Find vulnerabilities most relevant to the identified threats and determine what systems, OS, and platforms they specifically refer to. This is how we will reach our stated goal.