From cae2254439d0bc9c0c72d11f81ba60ae652b5982 Mon Sep 17 00:00:00 2001 From: Maria Grandury Date: Mon, 28 Aug 2023 17:17:52 +0200 Subject: [PATCH] docs: add delimiters to readme --- README.md | 42 +++++++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index e063ee0..a071014 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ En este repo encontrarás: - Hackathon Somos NLP [2022](https://github.com/somosnlp/recursos-nlp-es/tree/main/hackathon_2022) y [2023](https://github.com/somosnlp/recursos-nlp-es/tree/main/hackathon_2023): Diapositivas y notebooks de las charlas y talleres impartidas durante el mayor hackathon open-source de PLN en español - [Grupo de estudio](https://github.com/somosnlp/recursos-nlp-es/tree/main/grupo_de_estudio): Diapositivas y material del grupo de estudio, únete en el canal #grupo-de-estudio de Discord -Tenemos también una sección de la página web dedicada a recursos de PLN: https://somosnlp.org/recursos +Tenemos también una sección de la página web dedicada a recursos de PLN: Si no encuentras lo que estás buscando te animamos a unirte a Discord y preguntar a la comunidad. Aquí tienes una [invitación](https://discord.com/invite/my8w7JUxZR). @@ -14,23 +14,27 @@ Si no encuentras lo que estás buscando te animamos a unirte a Discord y pregunt - [Versión web](https://somosnlp.org/recursos/open-source/datasets) + + | nombre | tareas | idioma | página_web | github | paper | hf_dataset_name | hf_contributor_handle | dominio | pais | |:--------------------------------------------------|:-----------------------------------------------------|:-----------|:-----------------------------------------------------|:---------------------------------------------------------|:----------------------------------------------------|:-----------------------------------------------------------------------|:------------------------|:-----------|:-------| -| BasCrawl | modelado del lenguaje | eu | https://doi.org/10.5281/zenodo.7313092 | nan | nan | nan | nan | general | España | -| Biomedical Spanish CBOW Word Embeddings in Floret | modelado del lenguaje,CBOW (Continuous Bag Of Words) | es | https://doi.org/10.5281/zenodo.7314041 | https://arxiv.org/abs/2109.07765 | nan | nan | nan | clinico | España | -| CSIC Spanish Corpus | modelado del lenguaje | es | https://doi.org/10.5281/zenodo.7313126 | nan | nan | nan | nan | academico | España | -| Catalonia Independence Corpus | clasificación de sentimientos | ca, es | nan | https://github.com/ixa-ehu/catalonia-independence-corpus | https://www.aclweb.org/anthology/2020.lrec-1.171/ | catalonia_independence | lewtun | rrss | España | -| HEAD-QA | preguntas de opción múltiple | es | https://aghie.github.io/head-qa/ | https://github.com/aghie/head-qa | https://www.aclweb.org/anthology/P19-1092/ | head_qa | mariagrandury | clinico | España | -| InfoLibros Corpus | modelado del lenguaje | es | https://doi.org/10.5281/zenodo.7313105 | nan | nan | nan | nan | literatura | Varios | -| Large Spanish Corpus | modelado del lenguaje,pre-entrenamiento | es | nan | https://github.com/josecannete/spanish-corpora | nan | large_spanish_corpus | lewtun | general | Varios | -| Mucho Cine | clasificación de sentimientos | es | http://www.lsi.us.es/~fermin/index.php/Datasets | nan | nan | muchocine | mapmeld | general | ? | -| Spanish Billion Words | modelado del lenguaje,pre-entrenamiento | es | https://crscardellino.github.io/SBWCE/ | nan | nan | spanish_billion_words | mariagrandury | general | Varios | -| Spanish Biomedical Crawled Corpus | modelado del lenguaje | es | https://doi.org/10.5281/zenodo.5513237 | nan | https://arxiv.org/abs/2109.07765 | nan | nan | clinico | España | -| Spanish CBOW Word Embeddings in FastText | modelado del lenguaje,FastText | es | https://doi.org/10.5281/zenodo.5044988 | nan | nan | http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405 | nan | genera | España | -| Spanish CBOW Word Embeddings in Floret | modelado del lenguaje,CBOW (Continuous Bag Of Words) | es | https://doi.org/10.5281/zenodo.7314098 | nan | nan | nan | nan | general | España | -| Spanish Legal Domain Corpora | modelado del lenguaje | es | https://doi.org/10.5281/zenodo.5495529 | https://github.com/PlanTL-GOB-ES/lm-legal-es | https://arxiv.org/abs/2110.12201 | nan | nan | legal | España | -| Spanish Legal Domain Word & Sub-Word Embeddings | modelado del lenguaje | es | https://doi.org/10.5281/zenodo.5036147 | https://github.com/PlanTL-GOB-ES/lm-legal-es | https://arxiv.org/abs/2110.12201 | nan | nan | legal | España | -| Spanish Skip-Gram Word Embeddings in FastText | modelado del lenguaje,FastText | es | https://doi.org/10.5281/zenodo.5046525 | nan | nan | http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405 | nan | general | España | -| TDX Thesis Spanish Corpus | modelado del lenguaje | ca, es | https://doi.org/10.5281/zenodo.7313149 | nan | nan | nan | nan | academico | España | -| WikiCorpus | modelado del lenguaje,POS (Part of Speech) | ca, en, es | https://www.cs.upc.edu/~nlp/wikicorpus/ | nan | https://www.cs.upc.edu/~nlp/papers/reese10.pdf | wikicorpus | albertvillanova | general | Varios | -| eHealth-KD | NER (Named Entity Recognition) | es | https://knowledge-learning.github.io/ehealthkd-2020/ | https://github.com/knowledge-learning/ehealthkd-2020 | http://ceur-ws.org/Vol-2664/eHealth-KD_overview.pdf | ehealth_kd | mariagrandury | clinico | España | +| BasCrawl | modelado del lenguaje | eu | | nan | nan | nan | nan | general | España | +| Biomedical Spanish CBOW Word Embeddings in Floret | modelado del lenguaje,CBOW (Continuous Bag Of Words) | es | | | nan | nan | nan | clinico | España | +| CSIC Spanish Corpus | modelado del lenguaje | es | | nan | nan | nan | nan | academico | España | +| Catalonia Independence Corpus | clasificación de sentimientos | ca, es | nan | | | catalonia_independence | lewtun | rrss | España | +| HEAD-QA | preguntas de opción múltiple | es | | | | head_qa | mariagrandury | clinico | España | +| InfoLibros Corpus | modelado del lenguaje | es | | nan | nan | nan | nan | literatura | Varios | +| Large Spanish Corpus | modelado del lenguaje,pre-entrenamiento | es | nan | | nan | large_spanish_corpus | lewtun | general | Varios | +| Mucho Cine | clasificación de sentimientos | es | | nan | nan | muchocine | mapmeld | general | ? | +| Spanish Billion Words | modelado del lenguaje,pre-entrenamiento | es | | nan | nan | spanish_billion_words | mariagrandury | general | Varios | +| Spanish Biomedical Crawled Corpus | modelado del lenguaje | es | | nan | | nan | nan | clinico | España | +| Spanish CBOW Word Embeddings in FastText | modelado del lenguaje,FastText | es | | nan | nan | | nan | genera | España | +| Spanish CBOW Word Embeddings in Floret | modelado del lenguaje,CBOW (Continuous Bag Of Words) | es | | nan | nan | nan | nan | general | España | +| Spanish Legal Domain Corpora | modelado del lenguaje | es | | | | nan | nan | legal | España | +| Spanish Legal Domain Word & Sub-Word Embeddings | modelado del lenguaje | es | | | | nan | nan | legal | España | +| Spanish Skip-Gram Word Embeddings in FastText | modelado del lenguaje,FastText | es | | nan | nan | | nan | general | España | +| TDX Thesis Spanish Corpus | modelado del lenguaje | ca, es | | nan | nan | nan | nan | academico | España | +| WikiCorpus | modelado del lenguaje,POS (Part of Speech) | ca, en, es | | nan | | wikicorpus | albertvillanova | general | Varios | +| eHealth-KD | NER (Named Entity Recognition) | es | | | | ehealth_kd | mariagrandury | clinico | España | + +