Skip to content

Commit e7fedc1

Browse files
Initial commit
1 parent d73fc58 commit e7fedc1

21 files changed

+475
-0
lines changed

Diff for: Readme.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
cr-nlp provides a set of tools for natural language processing (NLP), including text tokenization, sentiment analysis, word lemmatization, and stemming. Built on top of popular NLP libraries such as NLTK and Hugging Face's Transformers, it simplifies common NLP tasks for developers and researchers.
2+
3+
Features
4+
Tokenize Text: Break down text into tokens using pre-trained models from Hugging Face's Transformers library.
5+
Analyze Sentiment: Determine the sentiment of text using both Transformers and NLTK's VADER model.
6+
Lemmatize Words: Convert words to their base form based on their part of speech.
7+
Stem Words: Reduce words to their root form using the Porter Stemming algorithm.
8+
Installation
9+
To install cr-nlp, run the following command:
10+
11+
12+
pip install cr-nlp
13+
Dependencies:
14+
Python 3.6+
15+
nltk
16+
transformers

Diff for: __init__.py

Whitespace-only changes.

Diff for: __pycache__/myfunctions.cpython-311.pyc

5.65 KB
Binary file not shown.

Diff for: build/lib/__init__.py

Whitespace-only changes.

Diff for: build/lib/cr_nlp.egg-info/PKG-INFO

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Metadata-Version: 2.1
2+
Name: cr_nlp
3+
Version: 0.1.6
4+
Summary: Library for NLP tasks by Aryan Oberoi
5+
Home-page: https://github.com/aryanoberoi/IITD-work/tree/main/library/test_library
6+
Author: Aryan Oberoi
7+
Author-email: [email protected]
8+
License: MIT
9+
Classifier: Intended Audience :: Developers
10+
Classifier: Programming Language :: Python
11+
Classifier: Programming Language :: Python :: 3
12+
Classifier: Programming Language :: Python :: 3.6
13+
Classifier: Programming Language :: Python :: 3.7
14+
Classifier: Programming Language :: Python :: 3.8
15+
Classifier: Programming Language :: Python :: 3.9
16+
Classifier: Operating System :: OS Independent
17+
Description-Content-Type: text/markdown
18+
Requires-Dist: transformers
19+
Requires-Dist: nltk
20+
21+
Library for NLP tasks by Aryan Oberoi

Diff for: build/lib/cr_nlp.egg-info/SOURCES.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
__init__.py
2+
myfunctions.py
3+
setup.py
4+
test_myfunctions.py
5+
cr_nlp.egg-info/PKG-INFO
6+
cr_nlp.egg-info/SOURCES.txt
7+
cr_nlp.egg-info/dependency_links.txt
8+
cr_nlp.egg-info/requires.txt
9+
cr_nlp.egg-info/top_level.txt

Diff for: build/lib/cr_nlp.egg-info/dependency_links.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Diff for: build/lib/cr_nlp.egg-info/requires.txt

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
transformers
2+
nltk

Diff for: build/lib/cr_nlp.egg-info/top_level.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Diff for: build/lib/myfunctions.py

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
from transformers import AutoTokenizer
2+
from transformers import pipeline
3+
import nltk
4+
5+
6+
def tokenize_text(text, model_name="bert-base-uncased"):
7+
"""
8+
Tokenize a given text using the Hugging Face Transformers library.
9+
10+
Parameters:
11+
- text (str): The input text to tokenize.
12+
- model_name (str): The name of the pre-trained model to use for tokenization.
13+
Default is "bert-base-uncased".
14+
15+
Returns:
16+
- tokens (list): List of tokens obtained by tokenizing the input text.
17+
"""
18+
19+
tokenizer = AutoTokenizer.from_pretrained(model_name)
20+
21+
tokens = tokenizer.tokenize(text)
22+
23+
return tokens
24+
25+
def analyze_sentiment(text, model_name="nlptown/bert-base-multilingual-uncased-sentiment"):
26+
"""
27+
Analyze sentiment of a given text using a pre-trained sentiment analysis model.
28+
29+
Parameters:
30+
- text (str): The input text for sentiment analysis.
31+
- model_name (str): The name of the pre-trained sentiment analysis model.
32+
Default is "nlptown/bert-base-multilingual-uncased-sentiment".
33+
34+
Returns:
35+
- sentiment (str): The predicted sentiment (e.g., "POSITIVE", "NEGATIVE", "NEUTRAL").
36+
- confidence (float): The confidence score associated with the predicted sentiment.
37+
"""
38+
39+
sentiment_analyzer = pipeline('sentiment-analysis', model=model_name)
40+
41+
42+
result = sentiment_analyzer(text)
43+
44+
45+
sentiment = result[0]['label']
46+
confidence = result[0]['score']
47+
48+
return sentiment, confidence
49+
50+
from nltk.stem import WordNetLemmatizer
51+
from nltk.corpus import wordnet
52+
from nltk import pos_tag
53+
54+
# Helper function to convert NLTK POS tags to WordNet POS tags
55+
def get_wordnet_pos(tag):
56+
if tag.startswith('J'):
57+
return wordnet.ADJ
58+
elif tag.startswith('V'):
59+
return wordnet.VERB
60+
elif tag.startswith('N'):
61+
return wordnet.NOUN
62+
elif tag.startswith('R'):
63+
return wordnet.ADV
64+
else:
65+
return None
66+
67+
def lemmatize_words(words):
68+
"""
69+
Lemmatize a list of words.
70+
71+
This function takes a list of words, determines the part of speech for each word,
72+
and then lemmatizes it (converts it to its base or dictionary form) according
73+
to its part of speech. It utilizes the NLTK library's WordNetLemmatizer
74+
and the part-of-speech tagging to accurately lemmatize each word.
75+
76+
Parameters:
77+
- words: A list of words (strings) that you want to lemmatize.
78+
79+
Returns:
80+
- A list of lemmatized words.
81+
82+
Note: This function requires nltk's WordNetLemmatizer and pos_tag to be imported,
83+
along with the wordnet corpus and a function get_wordnet_pos(tag) that converts
84+
the part-of-speech tagging conventions between nltk and wordnet.
85+
"""
86+
lemmatizer = WordNetLemmatizer()
87+
lemmatized_words = []
88+
89+
# Get POS tag for each word
90+
pos_tagged = pos_tag(words)
91+
92+
for word, tag in pos_tagged:
93+
wordnet_pos = get_wordnet_pos(tag) or wordnet.NOUN
94+
lemmatized_word = lemmatizer.lemmatize(word, pos=wordnet_pos)
95+
lemmatized_words.append(lemmatized_word)
96+
97+
return lemmatized_words
98+
99+
from nltk.sentiment.vader import SentimentIntensityAnalyzer
100+
101+
102+
def analyze_sentiment_vader(text):
103+
nltk.download('vader_lexicon')
104+
"""
105+
Analyzes the sentiment of a given text using VADER sentiment analysis.
106+
107+
Parameters:
108+
- text: A string containing the text to analyze.
109+
110+
Returns:
111+
- A dictionary containing the scores for negative, neutral, positive, and compound sentiments.
112+
"""
113+
sid = SentimentIntensityAnalyzer()
114+
sentiment_scores = sid.polarity_scores(text)
115+
return sentiment_scores
116+
117+
118+
119+
120+
import nltk
121+
from nltk.stem.porter import PorterStemmer
122+
123+
124+
def stem_words(words):
125+
"""
126+
Stems a list of words.
127+
128+
This function applies the Porter Stemming algorithm to a list of words,
129+
reducing each word to its root or stem form. It's particularly useful in
130+
natural language processing and search applications where the exact form of
131+
a word is less important than its root meaning.
132+
133+
Parameters:
134+
- words: A list of words (strings) to be stemmed.
135+
136+
Returns:
137+
- A list containing the stemmed version of each input word.
138+
139+
Example:
140+
>>> stem_words(["running", "jumps", "easily"])
141+
['run', 'jump', 'easili']
142+
143+
Note: This function requires the nltk's PorterStemmer to be imported.
144+
"""
145+
# Initialize the Porter Stemmer
146+
stemmer = PorterStemmer()
147+
148+
# Stem each word in the list
149+
stemmed_words = [stemmer.stem(word) for word in words]
150+
return stemmed_words
151+
152+
153+

Diff for: build/lib/test_myfunctions.py

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
from myfunctions import analyze_sentiment
2+
text=input("enter data")
3+
hello=str(analyze_sentiment(text))
4+
print(hello)
5+

Diff for: cr_nlp.egg-info/PKG-INFO

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Metadata-Version: 2.1
2+
Name: cr_nlp
3+
Version: 0.1.6
4+
Summary: Library for NLP tasks by Aryan Oberoi
5+
Home-page: https://github.com/aryanoberoi/IITD-work/tree/main/library/test_library
6+
Author: Aryan Oberoi
7+
Author-email: [email protected]
8+
License: MIT
9+
Classifier: Intended Audience :: Developers
10+
Classifier: Programming Language :: Python
11+
Classifier: Programming Language :: Python :: 3
12+
Classifier: Programming Language :: Python :: 3.6
13+
Classifier: Programming Language :: Python :: 3.7
14+
Classifier: Programming Language :: Python :: 3.8
15+
Classifier: Programming Language :: Python :: 3.9
16+
Classifier: Operating System :: OS Independent
17+
Description-Content-Type: text/markdown
18+
Requires-Dist: transformers
19+
Requires-Dist: nltk
20+
21+
Library for NLP tasks by Aryan Oberoi

Diff for: cr_nlp.egg-info/SOURCES.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
__init__.py
2+
myfunctions.py
3+
setup.py
4+
test_myfunctions.py
5+
cr_nlp.egg-info/PKG-INFO
6+
cr_nlp.egg-info/SOURCES.txt
7+
cr_nlp.egg-info/dependency_links.txt
8+
cr_nlp.egg-info/requires.txt
9+
cr_nlp.egg-info/top_level.txt

Diff for: cr_nlp.egg-info/dependency_links.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Diff for: cr_nlp.egg-info/requires.txt

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
transformers
2+
nltk

Diff for: cr_nlp.egg-info/top_level.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

Diff for: dist/cr_nlp-0.1.6-py3-none-any.whl

4.45 KB
Binary file not shown.

0 commit comments

Comments
 (0)