Text Processing with JavaScript
Regular Expressions, Tools, and Techniques for Optimal Performance
by Faraz K. Kelhini
You might think of regular expressions as the holy grail of text
processing, but are you sure you aren’t just shoehorning them in where
standard built-in solutions already exist and would work better?
JavaScript itself provides programmers with excellent methods for text
manipulation, and knowing how and when to use them will help you write
more efficient and performant code. From extracting data from APIs to
calculating word counts and everything in between, discover how to pick
the right tool for the job and make the absolute most of it every single
time.
Whether you’re a beginner or an advanced programmer, this up-to-date
guide will save you a ton of time when dealing with text. With Text
Processing with JavaScript, you’ll find a collection of tiny programs,
each demonstrating a string manipulation approach in JavaScript. You’ll
also focus squarely on the practical aspects of text processing with
JavaScript—that is, what each technique is designed to accomplish and
how to use it in your program.
Discover how to extract data from APIs and web pages, apply spelling
corrections, convert and format currencies, and remove HTML tags from
text. Learn to intersect tables, copy text to the clipboard, extract
lists from text, and highlight sentences that contain a specific word.
Find duplicate words and fix them automatically, modify a copy of an
existing regex literal, match the beginning or end of a string, and
remove all comments from JavaScript and HTML files with ease. Match
non-ASCII words, calculate the word count of an article in any language,
and more.
Become a JavaScript expert and master chef of text processing with this
collection of hands-on and production-ready recipes.
What You Need
To use this book, you should already know basic JavaScript syntax and
HTML. Use of HTML will be infrequent and fairly basic, and I’ll explain
each JavaScript example in detail. So even if your JavaScript or HTML is
rusty, you’ll be able to understand how the code is working.
Resources
Releases:
- P1.0 2023/12/19
- B6.0 2023/12/04
- B5.0 2023/09/22
- B4.0 2023/08/08
- Preface
- Who Is This Book For?
- What You Should Know
- What’s in This Book?
- Online Resources
- Part I: Text Processing with Built-in JavaScript Methods
- Determining If a Value Is a String with the typeof Operator
- Checking a String for Specific Words With
includes()
- Matching the Beginning or End of a String with startsWith() and
endsWith()
- Extracting Lists from Text with slice()
- Converting Color Names to Hexadecimal Values with the Canvas
Element
- Adding Transparency to Hex Colors
- Removing HTML Tags from Text with DOMParser()
- Converting HTML Markup to HTML Entities with replaceAll()
- Intersecting HTML Tables with filter()
- Generating HTML Tables from an Array of Arrays
- Generating HTML Tables from an Array of Objects
- Displaying Tabular Data in Console with console.table()
- Formatting Dates with Intl.DateTimeFormat()
- Formatting Currencies with Intl.NumberFormat()
- Adding Thousand Separators to Numbers with Intl.NumberFormat()
- Creating Language-Sensitive Lists with Intl.ListFormat()
- Determining Letter Case with charAt()
- Counting Unicode Characters with Intl.Segmenter()
- Counting Words in a String with Intl.Segmenter()
- Counting the Number of a Specific Word with split()
- Equalizing Incompatible Characters with normalize()
- Copying Text to Clipboard with the Clipboard API
- Part II: Text Processing with Regular Expressions
- Creating Your First Regular Expression
- Asserting the Start or End of a String with ^ and $
- Looking For Whole Words Only with the Word Boundary (\b)
- Matching One of Several Alternatives with the Vertical Bar (|)
- Matching One of Several Characters With the Character
Class
- Matching a Range of Characters with Character Classes
- Repeating Part of a Regex with Quantifiers
- Treating Multiple Characters as a Single Unit with the Capturing
Group
- Extracting a Matched Value with the Capturing Group
- Excluding Groups from Result with the Non-capturing Group
- Reading Groups with Ease Using Named Capturing Groups
- Using Special Replacement Patterns
- Taking Away the Special Meaning of Replacement Patterns
- Using a Function as the Replacement Pattern
- Escaping Metacharacters with the Backslash
- Creating Lazy Quantifiers with the Question Mark
- Global and Case-Insensitive Matching with the g and i Flags
- Generating Indices for Matches With the d
Flag
- Forcing ^ and $ to Match at the Start and End of a Line with
the m Flag
- Forcing . to Match Newline Characters with the s Flag
- Enabling Unicode Features with the u Flag
- Searching from a Specific Index with the y Flag
- Modifying an Existing Regex Literal
- Referencing a Matched String with the Backreference
- Testing a Pattern with the Positive Lookahead
- Testing a Pattern with the Negative Lookahead
- Testing a Pattern with the Positive Lookbehind
- Testing a Pattern with the Negative Lookbehind
- Matching Non-ASCII Numerals with the Unicode Property Escape
- Matching Non-ASCII Words with the Unicode Property Escape
- Matching Unicode Word Boundaries with the Unicode Property
Escape
- Part III: Mastering Text Processing in JavaScript
- Validating Email Addresses
- Validating Password Strength
- Validating Social Security Numbers
- Validating ZIP Codes
- Validating Canadian Postal Codes
- Removing Duplicate Lines
- Removing Duplicate Lines Separated by Other Lines
- Removing Duplicate Spaces
- Removing Duplicate Whitespaces
- Replacing Duplicate Whitespaces with the Same Type
- Extracting Text Enclosed in Double Quotes
- Extracting Text Enclosed in Single Quotes
- Escaping a String for Use in a Regex
- Striping Invalid Characters from Filenames
- Matching Floating-Point Numbers
- Matching Formatted Numbers with Thousand Separators
- Matching Nearby Words
- Highlighting Sentences Containing a Specific Word
- Highlighting Text in Real Time
- Converting Plain Text into HTML-Ready Markup
- What Is Unicode?
- Implementing Regex in JavaScript
- test()
- exec()
- match()
- matchAll()
- search()
- replace()
- replaceAll()
- split()
- Conclusion
- Testing Regex with Specialized Tools
- RegexPal
- RegExr
- Regex101
- RegexBuddy
- Regex Vis
- Regular Expression Cheat Sheet
- Character Classes
- Quantifiers
- Boundary Assertions
- Lookaround Assertions
- Groups and Backreferences
- Flags
- Unicode Property Escapes
Author
Faraz K. Kelhini is the author of Modern Asynchronous JavaScript.
With more than a decade of software development experience, Faraz has
in-depth knowledge of the JavaScript language and its related APIs.
Faraz is always passionate about moving the web forward and promoting
patterns and ideas that make coding more productive.