Skip to content

This repo contains code examples and tutorials for mining IMF documents.

License

Notifications You must be signed in to change notification settings

johnsonice/Fund_Textmining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMF_Textmining

This repo contains code examples and tutorials for mining IMF documents.

The purpose of this repository is to share some of the text mining work people have done in the Fund. We are trying to provide a set of well-written code examples (or tutorials) that people with little text mining experience can easily grasp and apply to their own problems.

Ideally, we want to cover as many programming languages as possible. Contributors with R and MATLAB experience are especially needed.

Current Topics

  • Intro to text analysis - introductions to some basic text analysis concepts (tokenizing, stemming, removing stop words etc)
  • Download and process COM's XML data - basic clean ups for COM's xml database
  • Basic keyword search - using IMF Staff Reports
  • Word Embedding - Word 2 vector, document 2 vector
  • Topic modeling - such as LDA
  • Sentiment analysis - both dictionary-based and machine-learning based
  • Document similarity measure [coming]
  • Data visualization - word cloud, embedding projection, ldaViz, knowledge graph etc

About

This repo contains code examples and tutorials for mining IMF documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published