Hence I wrote Tweet2Wordlist.py
The script is simple, it takes arguments that give it the depth of how many of the latest tweets to get from Twitter, then you give it Minimum and Maximum word lengths to filter on. Finally you can choose to either filter based on Geo Location (coordinates) or by Tweet Language, or both.
Keep the following in mind;
- Twitter may throttle/block you if you try to pull too much information.
- The Geo Location coordinates are given in the format latitude, longitude,radius. Radius is how far in a circle outward from your coordinates you want to search for tweets. This only works for users that have tweeted and allowed their location to be shared. Format example : 37.781157,-122.398720,1mi will search those coordinates plus 1 mile out. You can use km to indicate kilometers.
- The depth does not indicate word amounts! it indicates how many tweets to request from Twitter. They may have a lot, or little words in them that qualify based on your criteria.
- Twitter returns some characters not supported by the script, in this case it just ignores them.
- The language filter applies to the Tweeter's language setting, not what they actually typed into their tweet. Hence you may get words that are not in the language you are filtering.
TIP - See the cleanup list in the python code and modify it to suit your needs if you want more control over what to filter out of tweets.
Here is the help output from the script:
Usage: tweet2wordlist.py [options]
This tool is to simplify the dumping of words from tweets into wordlists.
Additionally, I have added features such as geo-location lookup of tweets as
well as the capability to control depth and word output sizes. Please send any
comments to the email listed in the program. If you find this tool useful,
tweet me at @Bitcrack_Cyber. See the blog post on http://i-am-
rurapenthe.blogspot.com for examples, more info etc.
-h, --help show this help message and exit
-m [1-1000], --max=[1-1000]
maximum word length to output
-n [1-1000], --min=[1-1000]
minimum word length to output
-o filename, --output=filename
output words to a file
-d 1-1000, --depth=1-1000
how many tweets to get from twitter.keep it reasonable
to avoid throttling
-g lat,long,radius[mi]/[km], --geo=lat,long,radius[mi]/[km]
geographic coordinates filter for tweets and radius
-l [EN][CN][FR] etc, --lang=[EN][CN][FR] etc
filter tweets based on language code in ISO 639-1 code
And you can download the script by clicking here : tweet2wordlist.py
Comments, feedback, suggestions etc are welcome. Please follow and/or comment on Twitter using @Bitcrack_Cyber
Dimitri AKA Rurapenthe