-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d94da14
commit 91f7ada
Showing
2 changed files
with
156 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
Title: UrlCrazy Readme | ||
Version: 0.2 | ||
Description: UrlCrazy is for the study of domainname typos / url hijacking. | ||
Release Date: March 2009 | ||
Author: horton.nz{at-nospam}gmail, Andrew Horton (urbanadventurer) | ||
Primary-site: code.google.com/p/urlcrazy | ||
Platforms: Linux, Anything with Ruby | ||
Copying-policy: BSD | ||
|
||
|
||
DESCRIPTION | ||
UrlCrazy is for the study of domainname typos / url hijacking. | ||
|
||
It generates domainname typo permutations then tests them to learn if they are in use, estimates their popularity and more. | ||
|
||
|
||
TYPES OF TYPOS SUPPORTED | ||
Character Omission. | ||
These typos are created by leaving out a letter of the domain name, one letter at a time. For example, www.goole.com and www.gogle.com | ||
|
||
Adjacent Character Swap. | ||
These typos are created by swapping the order of adjacent letters in the domain name. For example, www.googel.com and www.ogogle.com | ||
|
||
Adjacent Character Replacement. | ||
These typos are created by replacing each letter of the domain name with letters to the immediate left and right on the keyboard. For example, www.googke.com and www.goohle.com | ||
|
||
Adjacent Character Insertion. | ||
These typos are created by inserting letters to the immediate left and right on the keyboard of each letter. For example, www.googhle.com and www.goopgle.com | ||
|
||
Missing Dot. | ||
These typos are created by omitting a dot from the domainname. For example, wwwgoogle.com and www.googlecom | ||
|
||
Strip Dashes. | ||
These typos are created by omitting a dash from the domainname. For example, www.domain-name.com becomes www.domainname.com | ||
|
||
Singular or Pluralise. | ||
These typos are created by making a singular domain plural and vice versa. For example, www.google.com becomes www.googles.com and www.trademe.co.nz becomes www.trademes.co.nz | ||
|
||
|
||
|
||
DOMAIN TESTS | ||
Is the domain valid? | ||
-------------------- | ||
UrlCrazy has a database of valid top level and second level domains. This information has been compiled from wikipedia and domain registrars. We know whether a domain is valid by checking if it matches toplevel and second level domains. For example, www.trademe.co.bz is a valid domain in Belize which allows any second level domain registrations but www.trademe.xo.nz isn't because xo.nz isn't an allowed second level domain in New Zealand. | ||
|
||
Popularity Estimate | ||
------------------- | ||
We can estimate the relative popularity of a typo by measuring how often that typo has been made on webpages. Querying cuil.com for the number of search results for a typo gives us a indication of how popular a typo is. | ||
|
||
The drawback of this approach is that you need to manually identify and omit legitimate domains such as googles.com | ||
|
||
For example, consider the following typos for google.com. | ||
25424 gogle.com | ||
24031 googel.com | ||
22490 gooogle.com | ||
19172 googles.com | ||
19148 goole.com | ||
18855 googl.com | ||
17842 ggoogle.com | ||
16490 googe.com | ||
16367 googgle.com | ||
15029 google.cm | ||
14773 gogole.com | ||
13227 googlle.com | ||
11646 googlee.com | ||
11345 googlr.com | ||
7417 foogle.com | ||
6132 hoogle.com | ||
5313 googlw.com | ||
5208 giogle.com | ||
5151 googke.com | ||
4838 goigle.com | ||
4662 ogogle.com | ||
4630 gopgle.com | ||
4415 goofle.com | ||
4118 wwwgoogle.com | ||
3894 goohle.com | ||
3399 gooigle.com | ||
2675 gfoogle.com | ||
1942 googlecom.com | ||
1534 gopogle.com | ||
1356 googfle.com | ||
1089 googhle.com | ||
892 googlew.com | ||
747 googlke.com | ||
618 goiogle.com | ||
614 goopgle.com | ||
413 ghoogle.com | ||
341 goolge.com | ||
232 googler.com | ||
228 gpogle.com | ||
|
||
IP Address | ||
------------------- | ||
If the typo domainname is in use Urlcrazy displays the IP it resolves to. An IP repeating for multiple typos or IPs in a close range shows common ownership. For example, gogle.com, gogole.com and googel.com all resolve to 64.233.161.104 which is owned by Google. | ||
|
||
|
||
|
||
COUNTRY CODE DATABASE | ||
http://en.wikipedia.org/wiki/Top-level_domain | ||
http://en.wikipedia.org/wiki/Country_code_top-level_domain | ||
2nd level domains here: | ||
http://www.iana.org/domains/root/db/ | ||
|
||
|
||
SEE ALSO | ||
http://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser/Typos | ||
http://en.wikipedia.org/wiki/Wikipedia:Typo | ||
http://en.wikipedia.org/wiki/Typosquatting | ||
|
||
Strider is tool with similar aims and is produced by Microsoft http://research.microsoft.com/csm/strider/ | ||
|
||
|
||
INSTALLATION | ||
UrlCrazy requires ruby. If you are using Ubuntu or Debian try: | ||
$ sudo apt-get install ruby. | ||
|
||
Don't install this, instead execute it from it's own folder. | ||
|
||
|
||
|
||
CREDITS | ||
Authored by Andrew Horton (urbanadventurer) horton.nz {at-nospam} gmail | ||
|
||
Thanks to Ruby on Rails for Inflector which allowing plural and singular permutations. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters