Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 946 Bytes

File metadata and controls

34 lines (24 loc) · 946 Bytes

defiCorpus

A DeFi-native corpus built using ntlk & gensim.

The initial focus is Discourse-based Web3 Governance forums, but will be adding material from several mediums in the future.

The first pass came from a collection of DeFi protocols:

  • Aave
  • Balancer
  • Compound
  • Frax
  • GMX
  • Instadapp
  • Lido
  • Maker
  • Rocketpool
  • Sushiswap
  • Uniswap
  • Vesta
  • Yearn

Crypto news articles to be collected from Decrypt, The Block, the rest TDB.

Free to use and free to improve :)

Cheers!

TODO: social media, newsy, whitepaper collection

Note: Separating into several datasets to be merged after full review. Current train/test data needs a decent chunk of processing work.

UPDATE: Whitepaper collection underway. Pdf repo via @fgallaire, found here: https://github.com/Cryptorating/whitepapers

Going to initally split into technical & social datasets. Have around 10k Discord messages & 300 articles ready to add to corp.