Introduction to Java | Programming Basics

Unicode in Java

Unicode is a standard that provides a unique number for every character, no matter the platform, program, or language. Java uses Unicode to represent characters, ensuring that text is consistently encoded and readable in any environment.

1. What is Unicode?

Unicode is a universal character encoding standard that includes characters from almost all writing systems in the world. It allows Java programs to handle text in different languages without running into encoding issues.

2. Unicode in Java

In Java, characters are stored using the char data type, which is a 16-bit Unicode character. This allows Java to support a wide range of characters, including special characters, symbols, and text from different languages.

3. Unicode Escape Sequences

Java provides a way to represent Unicode characters using escape sequences. A Unicode escape sequence is written as u followed by four hexadecimal digits representing the character's Unicode value.

4. Example: Using Unicode in Java

Code Example


public class UnicodeExample {
public static void main(String[] args) {
// Declare a char variable using Unicode
    char unicodeChar = 'u0041';  // Unicode for 'A'
    char unicodeSymbol = 'u2605'; // Unicode for '★' (star)
        
    // Output the characters
    System.out.println("Unicode Character for A: " + unicodeChar);
    System.out.println("Unicode Symbol for Star: " + unicodeSymbol);
                    }
                }
                    

Output

Unicode Character for A: A
Unicode Symbol for Star: ★

5. Displaying Unicode Characters in Strings

You can also use Unicode escape sequences directly in strings to display special characters:

Code Example


public class UnicodeInStringExample {
public static void main(String[] args) {
    // Using Unicode in Strings
    String message = "Hello, u004Au0061u0076u0061! u2605"; // "Hello, Java! ★"
    System.out.println(message);
            }
        }
                    

Output

Hello, Java! ★

6. UTF-8 and Unicode

Java internally stores characters as Unicode using the UTF-16 encoding. UTF-8 is another popular encoding that stores characters using one to four bytes. While UTF-16 is used by Java to store char data, UTF-8 is commonly used for file and network operations.

7. Practical Use of Unicode in Java

Unicode in Java is useful for building internationalized applications that support multiple languages. By using Unicode, Java programs can handle text in diverse languages without worrying about platform-specific encodings.

Practice Exercises

  1. Write a program that prints the Unicode characters for the letters A to Z.
  2. Create a program that accepts a user's name and displays it using Unicode escape sequences.
  3. Implement a program that prints a string containing Unicode escape sequences for various symbols (e.g., heart, star, and smiley face).

💡 Pro Tip

When dealing with Unicode, ensure your text files are saved in UTF-8 format to avoid encoding issues across different platforms.