Regular Expressions is a powerful tool for finding and replacing symbols in text. A regular expression itself is just a string composed according to certain rules. This string has two slashes / /
, where after the first slash there is a special pattern for searching, and after the second – a set of flags that affect the result.
The power of regular expressions will be very useful in many programming tasks. Almost every language now has built-in tools for working with regular expressions. For example Python, JavaScript, Go, Kotlin, C#, and so on.
To practice using regular expressions, use Regex101.com.
- 🔍 Basic ussage
- 🚩 Flags
- 🗂️ Main syntax
- ✍️ Practice
- 📚 Additional materials
Let's take a random text as an example. Imagine that we need to find all the words Park
in this text. This is the easiest way to use regular expressions, we just need to write the word we want between the slashes:
Flags affect the search result. There are only five of them:
i
– allows you to ignore letter cases (there is no difference between A and a).g
– allows you to search for all matches in the text, without it - only the first one.m
– enable multiline mode (only affects the behavior of^
and$
).s
– text is treated as a single line, in which case the metasymbol.
(dot) corresponds to any single character, including the newline character.u
– unicode interpretation. Indicates that the expression may contain special patterns specific to Unicode.
Any character can take the place of a dot. The number of dots can determine the length of words.
/t..k/g
take look team
took
hike trackteak
time
Allow you to specify a specific list of characters.
/t[aoi]k/g
tek
tok
tdktak
tik
tuk tyk took taoik
Allows you to exclude a certain set of characters from the search, used in conjunction with square brackets.
/ba[^td]/g
ban
bag
batbas
bad
Specifies the range from the first to the last character (inclusive) in alphabetical order.
/[a-d]../g
ost hst
ast
fstcst
bst
It works the same way with numbers:
/201[5-9]/g
2010 2012
2015
2017
2019
2022
An asterisk after a character indicates that the character may be missing or match one or more times.
/wo*w/g
wow
waw wiwwoooow
waweww
woow
A plus after a character indicates that the character must be present one or more times.
/go+gle/g
gogle
gugle g00glegoooogle
A question mark after a character indicates that the character is optional (may be absent or occur only once).
/bou?nd/g
bond
bound
bouuund boynd
To specify the exact number of repetitions, you must write curly brackets with the desired number after the symbol.
/bo{3}m/g
boom bom
booom
bm boooom
To specify a range of repetitions, you must write a curly bracket after the symbol with the desired range, separated by a comma.
/lo{2,4}k/g
lok
look
lkloook
looook
loooooook
The upper bound can be omitted. For example, the entry a{3,}
says that the character a must occur at least three times.
Brackets allow you to group any sequence of characters so that you can refer to them later using the expression \number
, where the number is the sequence number of the grouped sequence.
/(la)-\1{2}-\1{3}/g // Group the expression "la" and then, refer to it with "\1"
la-laaa-
la-lala-lalala
-lalala-la-la-la
/(la)-\1-(laa)-\2/g
laa-la-laa-
la-la-laa-laa
-lalal
The (?:)
construction is used to ignore the saving of the group.
/(?:abc)-(test),\1,\1/g // In this case the group "abc" will not be saved, so the first index points to "test".
abc,test-
abc-test,test,test
-abc-test
You can give any name to groups. For this purpose the construction - (?P<Name>...)
is used, where Name
is the name of the group, ...
- any sequence of characters. To refer to named groups use the construction - (?P=Name)
.
/(?P<seven>7{3})-(?P=seven){2}-(?P=seven)/g
7777-77-7777777-
777-777777-777
-777-7-7-7-7777-7
If you have trouble understanding the grouping, I suggest watch this video.
The vertical slash allows you to specify alternatives to search for. This is somewhat similar to using square brackets [abc]
, but only vertical slash can handle whole words and expressions, not just individual characters.
/yes|no/g
yes
,maybe,no
,idk,ok
In order to use the special characters {} [] / \ + *. $ ^ |?
, you must put a slash \
in front of it.
/\.|\?/g // Search for dots "." or question marks "?"
What now
?
What next?
Times up.
Wake up.
The carriage symbol in the regular expression indicates that the search is only performed at the beginning of lines.
/^[0-9]*/gm // Search for numbers that are at the beginning of a string
1
. Apples x10
2
. Cookies x5
3
. Eggs x7
The dollar symbol in the regular expression indicates that the search is done only by the end of the string.
/com$|net$/gm
google.
com
command
sourceforge.net
netflix
There are built-in notations to make it easier to find an entire class of symbols.
The two entries below are equivalent.
/[a-zA-Z0-9_]/g
/\w/g
some
random
words
for
example
#@$% *%(^)_
+#1234
/[^a-zA-Z0-9_]/g
/\W/g
developer_2022
@
gmail.
com
/[0-9]/g
/\d/g
developer_
2022
@gmail.com
/[^0-9]/g
/\D/g
developer\_
2022@gmail.com
Spaces also include various line break characters.
/[\r\n\t\f\v ]/g
/\s/g
/[^\r\n\t\f\v ]/g
/\S/g
In order to find a phrase that should be before or after another phrase, position checks (lookarounds) are used.
To find an expression X followed by an expression Y, use the construction X(?=Y)
.
/\d+(?=€)/g
200$
750
€ 100$330
€ 550$
To find an expression X after which there is NOT an expression Y, use the construction X(?!Y)
.
/\d{4,}(?!€)/g
This car was costed about 7000€ in
2015
To find an expression X preceded by an expression Y, use the construction (?<=Y)X
.
/(?<=:)\d+/g
{ "id":
4
, "value":123
, name:"test" }
To find an expression X that is NOT preceded by an expression Y, use the construction (?<!Y)X
.
/(?<!\$)\d+/g
$5 $6 $7
2019
2009
1999
Take some time to reinforce what you've learned. Write a simple library in your favorite programming language that will validate given strings. For example, to verify if a phone number or email is valid. Write a validator for passwords to meet the specified requirements for length, special characters, uppercase letters or digits. This will be doubly useful because you can use this library in your applications in the future.
Additionally, by searching on Google - regex practice
, you can find many interesting assignments on the subject of regular expressions.