Skip to content

gitgoap/Sustainiblity-Lab-RA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Name: Aman Prakash

LinkedIn

LinkTree

([email protected])

Summary of the work done:

1. Assignment Work:

  • Cleaned the dataset as instructed, used Groq API and custom prompt to generate pandas code of the given 7 questions, all respective 7 codes output generated in separate jupyter cells.
  • LLM Used: deepseek-r1-distill-llama-70b from Groq.
  • All the details are available in llm_queries.ipynb file.

2. Streamlit Energy Bot: Live Here

  • This bot aims to show a live demonstration of the assignment given.

  • This is a live streamlit app where users can ask questions about the given energy dataset in natural language.

  • It converts the user query into code using the Groq API.

  • It also runs the code generated to display output and graphs.

  • Doesn't respond to user queries that are not relevant to the Energy Dataset.

  • At the start of this app, it gives a preview summary of the energy dataset.

Repository Structure

Sustainiblity-Lab-RA/
├── app.py
├── requirements.txt
├── data/
│   └── energy_data.csv
├── utils/
|   └── query_processor.py
└──llm_queries.ipynb

Prompt used in llm_queries.ipynb

You are an expert data analyst using Python pandas.

The dataset is stored at this path on my system:
C:/Users/aman/Desktop/Sustain_RA/household_power_consumption.txt

Assume I have already loaded and cleaned the dataset into a pandas DataFrame named `df` with a datetime index. The DataFrame has numeric columns, including 
'datetime', 'Global_active_power', 'Global_reactive_power', 'Voltage',	'Global_intensity',	'Sub_metering_1', 'Sub_metering_2',	'Sub_metering_3'

Assume, this code block is already run:
``import pandas as pd
df = pd.read_csv(
    "C:/Users/aman/Desktop/Sustain_RA/household_power_consumption.txt",
    sep=';',
    na_values=['?'],
    low_memory=False
)
df["datetime"] = pd.to_datetime(df["Date"] + " " + df["Time"], dayfirst=True)
df = df.drop(columns=["Date", "Time"])
df = df.dropna()
df["Global_active_power"] = df["Global_active_power"].astype(float)
df = df.set_index("datetime")
``
	

Write Python pandas code to answer each of these questions separately:



1. What was the average active power consumption in March 2007?
2. What hour of the day had the highest power usage on Christmas 2006?
3. Compare energy usage (Global_active_power) on weekdays vs weekends.
4. Find days where energy consumption exceeded 5 kWh.
5. Plot the energy usage trend for the first week of January 2007.
6. Find the average voltage for each day of the first week of February 2007.
7. What is the correlation between global active power and sub-metering values?

Keep these in mind: 
Use vectorized operations and pandas built-in methods instead of apply where possible.
When working with dates, use df.index and pandas datetime properties (e.g., index.month, index.dayofweek, index.date) in a vectorized way.
Avoid deprecated arguments or methods.
Include any necessary imports.
Ensure the code is robust and directly runnable in a modern pandas environment.
Generate only the code blocks with comments and no extra explanations.

For running the streamlit app locally

  1. Create a virtual environment (recommended):
python -m venv energy_app
# On Windows:
energy_app\Scripts\activate
# On Mac/Linux:
source energy_app/bin/activate
  1. Install Dependencies

pip install -r requirements.txt

Get Groq API Key

Go to https://console.groq.com/

About

Sustainiblity Lab IIT Gn Assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published