Skip to content

Identify URLs in Report Text

Juan-Pablo Velez edited this page Nov 16, 2013 · 4 revisions

What word(s) in a report text are describing URLs (links)?

Input and Output

Here's how it works:

Input: Report text

Output: List of words that are identified as URLs

Use Cases

Here's why it's useful:

  • Identifying text within report that refers to a URL/link, for automatic linking or copying to report fields.

Caveats

  • Doesn't work well with typos (e.g. ``www,google,com'' is obviously a (typo-ed) URL, but will not be caught currently).

  • URLs may sometimes be private.

Technical implementation

We used regular expressions to do this. Please check the relevant code for more details.

Future Work

None!