Thursday, September 5, 2013

Cracking Hashes with Other Language Character Sets

We've all been in this position. You're at your wits end trying to figure out why you cant crack any more passwords in the list you have.

And then someone mentions "By the way, they're not English characters..."

Now, usually that's enough to scare you away pretty quickly, and rightfuly so. Different platforms all have different ways of using, displaying and identifying non-English language characters. But all is not lost, there is a way.

Below is an example of how to run a crack with the Arabic alphabet and characters. This is done on an Ubuntu machine, using Hex codes based on the UTF-8 character set.

First, a primer:
We are using oclHashcat-plus, and Hashcat has a parameter called "--hex-charset". This very-important and handy switch will tell Hashcat that your custom character sets specified are actually in HEX, not in normal English characters.

For example, A normal character set would be -1 ?l?u123  which means Hashcat will brute-force using all lower-case letters, all upper-case letters and only the numbers 1,2 and 3. From this the word "ILOVEu123" could be derived.

Now, In the --hex-charset case, Hashcat treats all the character sets as HEX. Therefore -1 ABBBBC means Hashcat will take AB as a character, BB as a character and BC as a character and bruteforce their representative text from hex.

If we look at the Arabic character set in UTF-8 in the encoding table we have:

U+0606    ؆    d8 86    ARABIC-INDIC CUBE ROOT
U+0607    ؇    d8 87    ARABIC-INDIC FOURTH ROOT
U+0608    ؈    d8 88    ARABIC RAY
U+0609    ؉    d8 89    ARABIC-INDIC PER MILLE SIGN
U+060A    ؊    d8 8a    ARABIC-INDIC PER TEN THOUSAND SIGN


Notice that column 3 has the HEX representation of the Arabic character. In this case its D886 in the first line.

If you examine the list closely, you will notice that there's a base HEX code (char set) and then the actual character HEX code. So D8 is the base, 86 is a character, or 8a is a character etc.

Feeding this into Hashcat is a two-fold process. Remember its 2 HEX codes to make a character. But we dont know when d8 is used and what might be put with it, so to get around this we make two custom character-sets in Hashcat. One is our base HEX the other is the Character HEX.

Therefore, we have:
-1 d8d9dadb
-2  808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf

What this is doing is setting -1 to any of our base HEX characters, and -2 to any of the Arabic characters in their HEX codes.

In our attack, we will tell Hashcat to use a mask of ?1?2?1?2?1?2?1?2?1?2?1?2

Why? Because we need a base HEX (-1) and a character HEX (-2) to make a single Arabic character. I.e ?1?2 would result in D880 which is     ؀    d8 80    ARABIC NUMBER SIGN

Let's put it all together in a Hashcat attack using oclHashcat-plus.

./oclHashcat-plus64.bin -a 3 -1 d8d9dadb -2 808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf
 ?1?2?1?2?1?2?1?2?1?2?1?2?1?2?1?2?1?2

This will brute-force up to 9 Arabic characters. (NOTE: I left out everything else oclHashcat-plus  would need to run, fill that in yourself as its not in scope here)

Sample outputs are:
 تفاحة
 الترانزستور

etc etc.

Using this approach I managed to get a very high hit-rate in an all-Arabic hash list. Customize it as you need it and remember you can apply it to any HEX character set, or even add English UTF-8 HEX codes to brute-force a mix of English and Arabic.

Special thanks to Atom, and to http://www.utf8-chartable.de for their easy-to-read and use UTF-8 tables.


Dimitri aka RuraPenthe
@Bitcrack_Cyber