5

I have done an exhaustive search of stackoverflow and Google, but I have so far been unable to find others having a similar problem.

In a sample Java Swing test program, I create a plain JTextField so that I can try to paste characters into it from a webpage (http://isthisthingon.org/unicode/). When I test with '㓿' (code point 13567) it is able to paste the character. This character is the last listed character in the CJK Ideograph Extension A plane. However, when I move to the next related plane, CJK Ideograph Extension B, trying to copy and paste the character '𠀀' (code point 131072) fails. It does not render a box or any sort of glyph, it appears as if I had nothing in the system clipboard at all.

I realize that CJK Ideograph Extension B is a set of characters that are considered "supplemental" and need two 16bit blocks instead of one when Java encodes them internally as UTF-16. Further testing proves that I am able to display the supplemental characters if I hard-code the text into a display area.

This was tested using Windows 7 and Java 6.

I understand that as of Java 5, support for the supplemental unicode characters was added, however, I am wondering why (or if) the cut and paste functionality in swing still does not allow me to paste these characters. Is there something additional I need to do to tell Java to handle these characters when using the JTextField or JTextArea classes? Is there a way yet for Java's Swing libraries to be able to paste these characters into a text field yet?

Thank you for your time!

6
  • 1
    No sooner did I post this, than I may have found my answer. This has been a long standing bug in the JDK - bugs.sun.com/bugdatabase/view_bug.do?bug_id=6877495. Commented Aug 11, 2011 at 15:53
  • Unicode has had more characters than fit in a 16-bit integer for more most of its lifetime! I can’t believe that Java is still screwed up with this. But yesterday I found yet another UCS-2 bug in the Java String class, one that’s been there forever. This is ridiculous. The whole UTF-16 thing is a horrible curse, and Java will never be free of the countless bugs it causes. They are simply everywhere and it is maddening. People just can’t get things right. Commented Aug 12, 2011 at 1:18
  • Thanks Alexey! just created an answer. :) Commented Aug 12, 2011 at 15:40
  • @tchrist - what was the bug that you found in the String class? If it was submitted as an official bug could you post the link too? I've been doing a lot of work with i18n stuff here at work and the more I know about Java's quirks with respect to the supplementary character set, the better! Commented Aug 12, 2011 at 15:43
  • 1
    @Locriansax: No, I didn’t bug report it, I mailed it to i18n-dev openjdk list that I’m on. You can find that mail right here. The problem is that the code processes things by partial code points, not full ones, so gets wrong answers. It snuck by till Unicode 3.1 showed up in March 2001, because that introduced the Deseret script, which is a case-changing script up it the astral planes. It’s been broken >10 years. I hold all char-based Java code so super highly suspect that it’s guilty till proven innocent. Safe assumption. Commented Aug 12, 2011 at 18:49

1 Answer 1

2

No sooner did I post this, than I may have found my answer. This has been a long standing bug in the JDK.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.