# grapheme-breaker **Repository Path**: mirrors_foliojs/grapheme-breaker ## Basic Information - **Project Name**: grapheme-breaker - **Description**: A JS implementation of the Unicode grapheme cluster breaking algorithm (UAX #29) - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-24 - **Last Updated**: 2025-09-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # grapheme-breaker A JavaScript implementation of the Unicode grapheme cluster breaking algorithm ([UAX #29](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)) > It is important to recognize that what the user thinks of as a “character”—a basic unit of a writing system for a > language—may not be just a single Unicode code point. Instead, that basic unit may be made up of multiple Unicode > code points. To avoid ambiguity with the computer use of the term character, this is called a user-perceived character. > For example, “G” + acute-accent is a user-perceived character: users think of it as a single character, yet is actually > represented by two Unicode code points. These user-perceived characters are approximated by what is called a grapheme cluster, > which can be determined programmatically. ## Installation You can install via npm npm install grapheme-breaker ## Example ```javascript var GraphemeBreaker = require('grapheme-breaker'); // break a string into an array of grapheme clusters GraphemeBreaker.break('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞') // => ['Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍', 'A̴̵̜̰͔ͫ͗͢', 'L̠ͨͧͩ͘', 'G̴̻͈͍͔̹̑͗̎̅͛́', 'Ǫ̵̹̻̝̳͂̌̌͘', '!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞'] // or just count the number of grapheme clusters in a string GraphemeBreaker.countBreaks('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞') // => 6 // use nextBreak and previousBreak to get break points starting // from anywhere in the string GraphemeBreaker.nextBreak('😜🇺🇸👍', 3) // => 6 GraphemeBreaker.previousBreak('😜🇺🇸👍', 3) // => 2 ``` ## Development Notes In order to use the library, you shouldn't need to know this, but if you're interested in contributing or fixing bugs, these things might be of interest. * The `src/classes.trie` file is automatically generated from `GraphemeBreakProperty.txt` in the Unicode database by `src/generate_data.js`. It should be rare that you need to run this, but you may if, for instance, you want to change the Unicode version. * You can run the tests using `npm test`. They are written using `mocha`, and generated from `GraphemeBreakTest.txt` from the Unicode database, which is included in the repository for performance reasons while running them. ## License MIT