Skip to content

ahorner2/defiCorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

defiCorpus

A DeFi-native corpus built using ntlk & gensim.

The initial focus is Discourse-based Web3 Governance forums, but will be adding material from several mediums in the future.

The first pass came from a collection of DeFi protocols:

  • Aave
  • Balancer
  • Compound
  • Frax
  • GMX
  • Instadapp
  • Lido
  • Maker
  • Rocketpool
  • Sushiswap
  • Uniswap
  • Vesta
  • Yearn

Crypto news articles to be collected from Decrypt, The Block, the rest TDB.

Free to use and free to improve :)

Cheers!

TODO: social media, newsy, whitepaper collection

Note: Separating into several datasets to be merged after full review. Current train/test data needs a decent chunk of processing work.

UPDATE: Whitepaper collection underway. Pdf repo via @fgallaire, found here: https://github.com/Cryptorating/whitepapers

Going to initally split into technical & social datasets. Have around 10k Discord messages & 300 articles ready to add to corp.

About

A DeFi-native corpus for Web3 NLP. Built using nltk/gensim.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages