lindner (lindner) wrote,
lindner
lindner

On Vox: The Mysteries of Java Character Set Performance

"Two Characters Sets?  Seems like plenty!"


So I've been pushing Java to it's limits lately and finding some real nasty concurrency issues inside the JRE code itself.  Here's one particulary ugly one -- we had 700 threads stuck here:


       java.lang.Thread.State: BLOCKED (on object monitor)                                                                    
         at sun.nio.cs.FastCharsetProvider.charsetForName(FastCharsetProvider.java:118)
         - waiting to lock <0x00002aab4cdf91b8> (a sun.nio.cs.StandardCharsets)
         at java.nio.charset.Charset.lookup2(Charset.java:450) 
         at java.nio.charset.Charset.lookup(Charset.java:438)
         at java.nio.charset.Charset.isSupported(Charset.java:480) 
         at java.lang.StringCoding.lookupCharset(StringCoding.java:85) 
         at java.lang.StringCoding.decode(StringCoding.java:165)                                                                      
         at java.lang.String.<init>(String.java:516) 

Digging deeper we find the lookupCharset is called all over the place.  The app in question is functions as a web proxy, so it's constantly reading and writing data from web pages in a variety of character sets.  The method charsetForName() uses a synchronized data structure to lookup defined character sets.  (Yay serialized access....)

But wait, lookup and lookup2 provide us with a cache so we can avoid the big bad synchronized method..  Sigh, here's the implementation:

     private static Charset lookup(String charsetName) {
         if (charsetName == null)
             throw new IllegalArgumentException("Null charset name");
 
         Object[] a;
         if ((a = cache1) != null && charsetName.equals(a[0]))
             return (Charset)a[1];
         // We expect most programs to use one Charset repeatedly.
         // We convey a hint to this effect to the VM by putting the
         // level 1 cache miss code in a separate method.
         return lookup2(charsetName);
     }
 
     private static Charset lookup2(String charsetName) {
         Object[] a;
         if ((a = cache2) != null && charsetName.equals(a[0])) {
             cache2 = cache1;
             cache1 = a;
             return (Charset)a[1];
         }
 
         Charset cs;
         if ((cs = standardProvider.charsetForName(charsetName)) != null ||
             (cs = lookupExtendedCharset(charsetName))           != null ||
             (cs = lookupViaProviders(charsetName))              != null)
         {
             cache(charsetName, cs);
             return cs;
         }
 
         /* Only need to check the name if we didn't find a charset for it */
         checkName(charsetName);
         return null;
     }

Yes, a whopping 2-entry cache!!

Also, the keys used are not canonical, so if my app asks for "UTF-8", "utf-8", and "ISO-8859-1" with regularity this 2 entry cache is worthless, every call ends up blocking in the evil thread-synchronized data structure.

Someone send them a copy of the ConcurrentHashMap doc.  please.

</rant> ....

Originally posted on paul.vox.com

Subscribe

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 0 comments