Few days ago I’ve faced issue with domain which contains special characters such as www.schönesdresden.de. As of may-2010 it’s allowed to have top level domains in international language (e.g http://JP納豆.例.jp).
However, my task was to download the html using apache httpclient API therefore I have to use HttpGet(URI uri)
class where-as java.net.URI
could be any string which follows RFC 2396 rules (In short the domain names other than English are not considerable as valid URI) and finally I came across to following solution after couple of minutes searching on Google.
http://weblogs.java.net/blog/2007/03/29/international-domain-names
Basically first we have to convert the international domain name into ASCII Compatible Encoding (ACE) to pass it to HttpGet(URI uri)
class for this purpose Java SE 6 provides an interesting new class: java.net.IDN
It’s small, simple…very focused on a single task. That task has two parts:
- To convert domain names from practically any Unicode character to an ASCII Compatible Encoding or ACE.
- To convert ACE names back into their full Unicode UTF-16 encoding The toASCII method converts its non-ASCII Unicode characters to an ACE form using an algorithm called punycode.