Name: Aman Prakash
- Cleaned the dataset as instructed, used
Groq
API and custom prompt to generate pandas code of the given 7 questions, all respective 7 codes output generated in separate jupyter cells. - LLM Used:
deepseek-r1-distill-llama-70b
from Groq. - All the details are available in
llm_queries.ipynb
file.
2. Streamlit Energy Bot: Live Here
-
This bot aims to show a
live demonstration of the assignment
given. -
This is a live streamlit app where users can ask questions about the given energy dataset in natural language.
-
It converts the user query into code using the Groq API.
-
It also runs the code generated to display output and graphs.
-
Doesn't respond to user queries that are
not relevant
to the Energy Dataset. -
At the start of this app, it gives a preview summary of the energy dataset.
Sustainiblity-Lab-RA/
├── app.py
├── requirements.txt
├── data/
│ └── energy_data.csv
├── utils/
| └── query_processor.py
└──llm_queries.ipynb
You are an expert data analyst using Python pandas.
The dataset is stored at this path on my system:
C:/Users/aman/Desktop/Sustain_RA/household_power_consumption.txt
Assume I have already loaded and cleaned the dataset into a pandas DataFrame named `df` with a datetime index. The DataFrame has numeric columns, including
'datetime', 'Global_active_power', 'Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3'
Assume, this code block is already run:
``import pandas as pd
df = pd.read_csv(
"C:/Users/aman/Desktop/Sustain_RA/household_power_consumption.txt",
sep=';',
na_values=['?'],
low_memory=False
)
df["datetime"] = pd.to_datetime(df["Date"] + " " + df["Time"], dayfirst=True)
df = df.drop(columns=["Date", "Time"])
df = df.dropna()
df["Global_active_power"] = df["Global_active_power"].astype(float)
df = df.set_index("datetime")
``
Write Python pandas code to answer each of these questions separately:
1. What was the average active power consumption in March 2007?
2. What hour of the day had the highest power usage on Christmas 2006?
3. Compare energy usage (Global_active_power) on weekdays vs weekends.
4. Find days where energy consumption exceeded 5 kWh.
5. Plot the energy usage trend for the first week of January 2007.
6. Find the average voltage for each day of the first week of February 2007.
7. What is the correlation between global active power and sub-metering values?
Keep these in mind:
Use vectorized operations and pandas built-in methods instead of apply where possible.
When working with dates, use df.index and pandas datetime properties (e.g., index.month, index.dayofweek, index.date) in a vectorized way.
Avoid deprecated arguments or methods.
Include any necessary imports.
Ensure the code is robust and directly runnable in a modern pandas environment.
Generate only the code blocks with comments and no extra explanations.
- Create a virtual environment (recommended):
python -m venv energy_app
# On Windows:
energy_app\Scripts\activate
# On Mac/Linux:
source energy_app/bin/activate
- Install Dependencies
pip install -r requirements.txt
Get Groq API Key