ObjectHunter
Hunting and gathering in the JEE landscape

I am Frank, a freelance Java developer specialized in backend development from south western germany.

E-Mail address obfuscation with Java/Javascript

posted by fas on 2010-03-22 . Tagged as javascript, web, programming, java

Like german tourists the email address gathering robots are everywhere. These nifty little http crawler search through websites in order to find new email addresses for their spamming masters. So in order to hide an email address on a webpage one has to go through the trouble of encoding mail addresses in a way that robots can no longer identify them as valid.

There are lots of possibilities to achieve this:

  • CSS transformations to reverse the address
  • Encode/Decode the address using some algorithm
  • Flash embedding

Since i wanted the user still to be able to copy mail addresses by clicking the "copy mail"-address entry in the context menu I chose the following method for my obfuscation system:
  • reverse the whole address
  • translate the address bytes to a hex representation
  • replace the '@' and the '.' character with some arbitrary strings.
  • decode the mail address in the browser by a JavaScript function on a mouseover event


The encoding is done by a Java method:

    public static String scrambleEmail(String email) {
        StringBuilder encoded = new StringBuilder(email.length() * 6);
        //reverse the input
        email=new StringBuffer(email).reverse().toString();
        byte[] bytes = email.getBytes(Charset.forName("UTF-8"));
        for (int j = 0; j < bytes.length; j++) {
            String hex=byteToHex(bytes[j]);
            if (hex.equals("40")) hex="7b;53;43;52;41;4d;42;4c;45;7d"; // the @ symbol
            if (hex.equals("2e")) hex="5b;53;43;52;41;4d;42;4c;45;5d"; // the . symbol
            encoded.append(hex + ";");
        }
        return encoded.toString();
    }


The user agent decodes the mail address with this Javascript function:
function descramble(elementId){
    if (contains(descrambled,elementId)) return;
    var el=document.getElementById(elementId);
    var scramble=el.href.toString();
    var del=scramble.lastIndexOf('/') + 1;
    scramble=scramble.substr(del, scramble.length - del - 1);
    var result='';
    var codes=scramble.split(';');
    for (var j=0;j<codes.length;j++){
        if (codes[j].length > 1){
            var ch=String.fromCharCode(parseInt(codes[j],16));
            result=result + ch;
        }
    }
    result=result.replace(/\[SCRAMBLE\]/g,'.').replace(/\{SCRAMBLE\}/g,'@').split('').reverse().join('');
    el.href='mailto:' + result;
    el.title='write ' + result;
    descrambled[descrambled.length]=elementId;
}


And I added the function to the onmouseover event of an a tag so the mail address gets descrambled as soon as the user moves his mouse over it.
<a id="wm_id1" onmouseover="javascript:descramble('wm_id1');" href="65;64;5b;53;43;52;41;4d;42;4c;45;5d;65;63;61;72;67;6e;6f;63;7b;53;43;52;41;4d;42;4c;45;7d;72;65;74;6e;75;68;74;63;65;6a;62;6f;">Write E-Mail</a>


So as soon as the User get his mouse over the element holding the encoded address, the Javscript function gets called and replaces the inner text by the decoded email address.
Although lots of the newer spam robots actually have a Javascript engine, most just fire the onload event of the html body, and ignore the onmouseover events on the rest of the elements.



Tags: javascript, web, programming, java