Skip to content

Commit

Permalink
Add IrBlogs to the README.md (#34)
Browse files Browse the repository at this point in the history
* add farsi blogs

* modify crawler README file

* modify #2 crawler README file

* add strip function to hamshahri_spider

* add irblogs to readme.md

* add irblogs to readme.md
  • Loading branch information
Sahand504 authored and sehsanm committed Dec 27, 2018
1 parent 21bfa49 commit c1fc7ee
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions data/corpus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ This link contains the extracted text from FaWiki XML file.
First, we extracted the text data. Next, we normalized it and simply segmented its sentences using regular expressions.

You can download the corpus using this [LINK](https://sbuacir-my.sharepoint.com/personal/se_mahmoudi_sbu_ac_ir/_layouts/15/download.aspx?SourceUrl=%2Fpersonal%2Fse_mahmoudi_sbu_ac_ir%2FDocuments%2Fsbunlp%2FwikiDump_dotSplitData_Nikvand.zip) here

## IrBlogs
irBlogs is a standard Persian weblogs collection that is suitable for studying Persian social networks and evaluation of graph mining and blog retrieval algorithms.

You can find the collection [here](http://dbrg.ut.ac.ir/irblogs/)

0 comments on commit c1fc7ee

Please sign in to comment.