Mastering Email Validation with a Precision Regex Pattern
New to coding and finding regular expressions (regex) pattern for email address form validation a bit overwhelming?
This blog will walk you through email validation step by step, using a special regex pattern. It'll break down what email addresses look like and how to check if they're right. You'll see that regex isn't so scary after all!
Structure of Email Address
1. Username:
The username is the user-specific part of an email address.
It can consist of letters, digits, and special characters like periods (.), underscores (_), and hyphens (-).
2. Domain:
The domain is the main part of the email address, specifying the mail server responsible for receiving and sending emails.
It usually consists of a domain name and a top-level domain (TLD).
3. Top-Level Domains (TLDs):
TLDs are the highest level in the domain naming system.
They can be classified into two categories:
gTLD (generic domains): Used for general purposes and not tied to any specific country or region. Common gTLDs include ".com," "co," ".org," ".net," ".edu," ".gov," etc.
ccTLD (country code top-level domains): Specific to individual countries or regions. Examples include ".in" for India, ".eu" for the European Union, ".de" for Germany, etc.
Understanding these components is essential for validating and working with email addresses effectively.
Formatting Email Address
Username:
Can contain lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), periods (.), underscores (_), and hyphens (-) and as valid characters
Should be between 1 and 63 characters long
Should not start or end with a period (.), underscore (_), and hyphen (-)
Domain:
Can contain lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), and hyphens (-) as valid characters
Should be between 1 and 63 characters long
Should not start or end with a hyphen (-)
Regex Pattern for Email Address
This regex pattern is used for email validation and enforces specific rules for the local part and domain part of an email address.
^((?!\.)(?!.*(\.\.|__|--))[a-zA-Z0-9._-]{1,63}(?<!\.))@((?!-)(?!.*(--))[a-zA-Z0-9-]{1,63}(?<!-))\.([a-zA-Z]{2,8})(\.[a-zA-Z]{2,8})?$
Here's an explanation of each part of the pattern:
^
: Asserts the start of the string((?!\.)(?!.*(\.\.|__|--))[a-zA-Z0-9._-]{1,63}(?<!\.))
:This part matches the username/local part of the email address (before the
@
symbol)(?!\.)
: Negative Lookahead to ensure that it doesn't start with a period(?!.*(\.\.|__|--))
: Negative Lookahead to ensure that there are no consecutive periods, underscores and hyphens[a-zA-Z0-9._-]{1,63}
: Matches 1 to 63 characters consisting of letters, digits, periods, underscores and hyphens(?<!\.)
: Negative Lookbehind to ensure that it doesn't end with a period
@
: Matches the "@" symbol((?!-)(?!.*(--))[a-zA-Z0-9-]{1,63}(?<!-))
:This part matches the domain part of the email address (after the
@
symbol).(?!-)
: Negative lookahead to ensure that it doesn't start with a hyphen(?!.*(--))
: Negative Lookahead to ensure that there are no consecutive hyphens[a-zA-Z0-9-]{1,63}
: Matches 1 to 63 characters consisting of letters, digits, and hyphens(?<!-)
: Negative Lookbehind to ensure that it doesn't end with a hyphen
\.
: Matches a literal period([a-zA-Z]{2,8})
: Matches the generic top-level domain (gTLD) part of the domain, which should be 2 to 8 letters(\.[a-zA-Z]{2,8})?
:This part is optional and allows for ccTLDs
(\.[a-zA-Z]{2,8})
: Matches a period followed by 2 to 8 letters?
: Makes the subdomain part optional
$
: Asserts the end of the string
However, email validation can be quite complex, and this regex may not cover all edge cases or adhere to the full email specification. It's important to consider using a dedicated library or service for robust email validation in production systems.