Swift Unicode and Scalars
🌐 Swift Unicode and Scalars (Deep but Simple)
To understand this properly, you need to know Unicode, Unicode Scalars, and how Swift builds Characters from them.
1. What is Unicode?
Unicode is a universal standard that assigns a unique number (code point) to every character in every language.
Examples:
-
A→ U+0041 -
é→ U+00E9 -
🙂→ U+1F642
Swift strings are built on Unicode, not ASCII.
2. Unicode Scalar (Foundation Concept)
A Unicode Scalar is a single Unicode code point.
In Swift:
Output:
👉 65 is the Unicode value of A
3. Unicode Scalars in Swift
Swift represents Unicode scalars using the type:
You can access them using .unicodeScalars
Output:
4. Unicode Scalar Literals
You can create characters using Unicode scalar values.
Output:
5. Character vs Unicode Scalar (Very Important ⚠️)
🔸 Unicode Scalar
-
Single code point
-
Low-level representation
🔸 Character
-
User-perceived character
-
Can be made of multiple Unicode scalars
Example:
This single character may be:
-
One scalar (
U+00E9) -
OR multiple scalars (
e+́)
Swift handles this automatically.
6. Emoji = Multiple Unicode Scalars 🤯
Many emojis are composed of multiple Unicode scalars.
Output:
Even though it uses many scalars, Swift treats it as one Character.
7. .count vs .unicodeScalars.count
Output:
👉 This proves:
-
count→ user-visible characters -
unicodeScalars.count→ actual Unicode building blocks
8. Iterating Characters vs Scalars
Iterate Characters
Iterate Unicode Scalars
Different levels, different use cases.
9. Why Swift Uses Unicode Scalars
✔ Correct emoji handling
✔ Accurate string length
✔ Safe internationalization
✔ No broken characters
Other languages often break here—Swift doesn’t.
10. When Should You Use Unicode Scalars?
Use .unicodeScalars when:
-
Working with low-level text processing
-
Validating characters
-
Implementing parsers
-
Handling encodings
For normal apps → use String and Character
🧠 Mental Model (Easy)
📌 Summary Table
| Concept | Meaning |
|---|---|
| Unicode | Global character standard |
| UnicodeScalar | Single Unicode code point |
| Character | User-visible character |
| String | Collection of Characters |
.count |
Number of Characters |
.unicodeScalars.count |
Number of Unicode Scalars |
