Skip to content

Latest commit

 

History

History
executable file
·
52 lines (41 loc) · 3.83 KB

README.md

File metadata and controls

executable file
·
52 lines (41 loc) · 3.83 KB

Course Aids

Python tutorial

See doc here

Numerical computing with Python (NumPy)

See doc here

Parallel/Multithreading in Python

A tutorial on Pandas

A tutorial on PyTorch

A tutorial on Tensorflow

NLP Index of Terms

An index of Natural Language Processing (NLP) concepts serves as a foundational glossary for students beginning their journey into this complex field. Here are some key concepts:

  1. Tokenization: The process of splitting text into individual words or phrases.
  2. Corpus: A large collection of text data used for NLP tasks.
  3. Stemming: Reducing words to their root form by removing suffixes.
  4. Lemmatization: Converting words to their dictionary form by considering the context.
  5. Part-of-Speech Tagging: Identifying the grammatical parts of speech of each word in a sentence.
  6. Named Entity Recognition (NER): Identifying and classifying named entities (people, places, organizations) in text.
  7. Word Embeddings: Representation of words in a continuous vector space where similar words have similar representations.
  8. Word2Vec: A method to produce word embeddings by using a neural network model.
  9. GloVe (Global Vectors): An unsupervised learning algorithm for generating word embeddings by aggregating global word-word co-occurrence statistics from a corpus.
  10. Syntax Tree: A tree representation of the syntactic structure of sentences.
  11. Dependency Parsing: Analyzing the grammatical structure of a sentence to establish relationships between "head" words and words that modify those heads.
  12. Bag of Words (BoW): A representation of text that describes the occurrence of words within a document.
  13. TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic intended to reflect how important a word is to a document in a collection or corpus.
  14. N-grams: Contiguous sequences of n-items from a given sample of text or speech.
  15. Stop Words: Commonly used words (such as "the", "a", "an", "in") which are typically ignored in NLP tasks.
  16. Recurrent Neural Networks (RNNs): A class of neural networks for processing sequential data.
  17. Long Short-Term Memory (LSTM): A special kind of RNN capable of learning long-term dependencies.
  18. Gated Recurrent Unit (GRU): A variant of LSTM that uses gating mechanisms to control the flow of information.
  19. Attention Mechanism: A component of neural networks that allows the model to focus on different parts of the input sequentially.
  20. Transformer Architecture: A model architecture that uses self-attention mechanisms and has become the basis for many state-of-the-art NLP models.
  21. BERT (Bidirectional Encoder Representations from Transformers): A method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.
  22. GPT (Generative Pretrained Transformer): An autoregressive language model that uses deep learning to produce human-like text.
  23. Language Modeling: The task of predicting the next word in a sentence given the previous words.
  24. Machine Translation: The task of automatically converting text from one language to another.
  25. Text Classification: The task of assigning predefined categories to text.
  26. Sentiment Analysis: The process of determining the emotional tone within a series of words to gain an understanding of the attitudes, opinions, and emotions expressed.
  27. Topic Modeling: The task of identifying topics that best describe a set of documents.
  28. Dialog Systems and Chatbots: Computer systems designed to converse with human users via natural language.
  29. Speech Recognition: The process of converting spoken words into text.
  30. Natural Language Generation (NLG): The task of generating natural language from a machine representation system.