Skip to content

Latest commit

 

History

History
368 lines (238 loc) · 10.2 KB

README.md

File metadata and controls

368 lines (238 loc) · 10.2 KB

Regex by example

English Russian

Regular Expressions is a powerful tool for finding and replacing symbols in text. A regular expression itself is just a string composed according to certain rules. This string has two slashes / /, where after the first slash there is a special pattern for searching, and after the second – a set of flags that affect the result.

The power of regular expressions will be very useful in many programming tasks. Almost every language now has built-in tools for working with regular expressions. For example Python, JavaScript, Go, Kotlin, C#, and so on.

To practice using regular expressions, use Regex101.com.

📄 Content


🔍 Basic ussage

Let's take a random text as an example. Imagine that we need to find all the words Park in this text. This is the easiest way to use regular expressions, we just need to write the word we want between the slashes:

Regex example

🚩 Flags

Flags affect the search result. There are only five of them:

  • i – allows you to ignore letter cases (there is no difference between A and a).
  • g – allows you to search for all matches in the text, without it - only the first one.
  • m – enable multiline mode (only affects the behavior of ^ and $).
  • s – text is treated as a single line, in which case the metasymbol . (dot) corresponds to any single character, including the newline character.
  • u – unicode interpretation. Indicates that the expression may contain special patterns specific to Unicode.

🗂️ Main syntax

Any symbol .

Any character can take the place of a dot. The number of dots can determine the length of words.

/t..k/g

take look team took hike track teak time

List []

Allow you to specify a specific list of characters.

/t[aoi]k/g

tek tok tdk tak tik tuk tyk took taoik

Excluding list [^]

Allows you to exclude a certain set of characters from the search, used in conjunction with square brackets.

/ba[^td]/g

ban bag bat bas bad

Range [-]

Specifies the range from the first to the last character (inclusive) in alphabetical order.

/[a-d]../g

ost hst ast fst cst bst

It works the same way with numbers:

/201[5-9]/g

2010 2012 2015 2017 2019 2022

Repeats *

An asterisk after a character indicates that the character may be missing or match one or more times.

/wo*w/g

wow waw wiw woooow wawe ww woow

At least repeat +

A plus after a character indicates that the character must be present one or more times.

/go+gle/g

google ggle gogle gugle g00gle goooogle

Optional symbol ?

A question mark after a character indicates that the character is optional (may be absent or occur only once).

/bou?nd/g

bond bound bouuund boynd

Number of repetitions {}

To specify the exact number of repetitions, you must write curly brackets with the desired number after the symbol.

/bo{3}m/g

boom bom booom bm boooom

Repetition range {,}

To specify a range of repetitions, you must write a curly bracket after the symbol with the desired range, separated by a comma.

/lo{2,4}k/g

lok look lk loook looook loooooook

The upper bound can be omitted. For example, the entry a{3,} says that the character a must occur at least three times.

Grouping ()

Brackets allow you to group any sequence of characters so that you can refer to them later using the expression \number, where the number is the sequence number of the grouped sequence.

/(la)-\1{2}-\1{3}/g // Group the expression "la" and then, refer to it with "\1"

la-laaa-la-lala-lalala-lalala-la-la-la

/(la)-\1-(laa)-\2/g

laa-la-laa-la-la-laa-laa-lalal

The (?:) construction is used to ignore the saving of the group.

/(?:abc)-(test),\1,\1/g // In this case the group "abc" will not be saved, so the first index points to "test".

abc,test-abc-test,test,test-abc-test

You can give any name to groups. For this purpose the construction - (?P<Name>...) is used, where Name is the name of the group, ... - any sequence of characters. To refer to named groups use the construction - (?P=Name).

/(?P<seven>7{3})-(?P=seven){2}-(?P=seven)/g

7777-77-7777777-777-777777-777-777-7-7-7-7777-7

If you have trouble understanding the grouping, I suggest watch this video.

Logical OR |

The vertical slash allows you to specify alternatives to search for. This is somewhat similar to using square brackets [abc], but only vertical slash can handle whole words and expressions, not just individual characters.

/yes|no/g

yes,maybe,no,idk,ok

Shielding \

In order to use the special characters {} [] / \ + *. $ ^ |?, you must put a slash \ in front of it.

/\.|\?/g // Search for dots "." or question marks "?"

What now? What next? Times up. Wake up.

Search at the beginning ^

The carriage symbol in the regular expression indicates that the search is only performed at the beginning of lines.

/^[0-9]*/gm // Search for numbers that are at the beginning of a string

1. Apples x10
2. Cookies x5
3. Eggs x7

Search at the end $

The dollar symbol in the regular expression indicates that the search is done only by the end of the string.

/com$|net$/gm

google.com
command
sourceforge.net
netflix

Classes of symbols

There are built-in notations to make it easier to find an entire class of symbols.

Any verbal symbol \w

The two entries below are equivalent.

/[a-zA-Z0-9_]/g
/\w/g

some random words for example #@$% *%(^)_+# 1234

Any non verbal symbol \W

/[^a-zA-Z0-9_]/g
/\W/g

developer_2022@gmail.com

Any number \d

/[0-9]/g
/\d/g

developer_2022@gmail.com

Any character except numbers \D

/[^0-9]/g
/\D/g

developer\_2022@gmail.com

Space symbol \s

Spaces also include various line break characters.

/[\r\n\t\f\v ]/g
/\s/g

Any character except a space \S

/[^\r\n\t\f\v ]/g
/\S/g

Lookarounds

In order to find a phrase that should be before or after another phrase, position checks (lookarounds) are used.

Preemptive inspections (?=) (?!)

To find an expression X followed by an expression Y, use the construction X(?=Y).

/\d+(?=€)/g

200$ 750€ 100$ 330€ 550$

To find an expression X after which there is NOT an expression Y, use the construction X(?!Y).

/\d{4,}(?!€)/g

This car was costed about 7000€ in 2015

Retrospective inspections (?<=) (?<!)

To find an expression X preceded by an expression Y, use the construction (?<=Y)X.

/(?<=:)\d+/g

{ "id":4, "value":123, name:"test" }

To find an expression X that is NOT preceded by an expression Y, use the construction (?<!Y)X.

/(?<!\$)\d+/g

$5 $6 $7 2019 2009 1999

✍️ Practice

Take some time to reinforce what you've learned. Write a simple library in your favorite programming language that will validate given strings. For example, to verify if a phone number or email is valid. Write a validator for passwords to meet the specified requirements for length, special characters, uppercase letters or digits. This will be doubly useful because you can use this library in your applications in the future.

Additionally, by searching on Google - regex practice, you can find many interesting assignments on the subject of regular expressions.

📚 Additional materials

  1. 📄 Awesome Regex – GitHub
  2. 📺 Practice Regular Expressions with Regex Golf! – YouTube
  3. 📘 Regular Expressions Cookbook – J. Goyvaerts and S. Levithan, 2012