Skip to content

Latest commit

 

History

History
2694 lines (2184 loc) · 45.9 KB

IMUS_Project.md

File metadata and controls

2694 lines (2184 loc) · 45.9 KB

Project IMUS

Dataset description:

What is an IMU?

An inertial measurement unit (IMU) measures and reports raw or filtered angular rate and specific force/acceleration experienced by the object it is attached to. Data outputs for an IMU are typically body-frame accelerations, angular rates, and (optionally) magnetic field measurements.

The columns of the dataset are:

  1. Category: The category to which this IMU belongs.

  2. Manufacturer: Name of the manufacturer.

  3. part number: Discrete part number

  4. sku: stock keeping unit

  5. alt_sku: alternate stock keeping unit

  6. Description: description of the product

  7. Status: Whether it is in stock or not

  8. Quantity: Quantity available

  9. Price: Price of the product

  10. sku_create_date: when was the product procured

Importing Related Resources

import pandas as pd
import numpy as np
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
file ='imus.xlsx'
df = pd.concat(pd.read_excel("imus.xlsx",sheet_name= None),ignore_index = True) # dedining df variable to file , merging of all sheet with concat method.

Pre-Processing

Intial Insepection

df.head() 
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer part_number sku alt_sku description status qty price sku_create_date
0 1539 MS-OPEN VALUE SUBSCRIPTION 021-09108 BA0011 NaN 1YR OFFICESTD ALNG LICSAPK OLV, D AP No Stock 0 143.58 2020-08-17
1 1539 MS-OPEN VALUE SUBSCRIPTION 059-07014 BA0025 NaN 1YR WORD ALNG LICSAPK OLV D AP, No Stock 0 56.09 2020-08-17
2 1539 MS-OPEN VALUE SUBSCRIPTION 065-06444 BA0039 NaN 1YR EXCEL ALNG LICSAPK OLV D AP, No Stock 0 56.09 2020-08-17
3 1539 MS-OPEN VALUE SUBSCRIPTION 076-04355 BA0067 NaN 1YR PRJCT ALNG LICSAPK OLV D AP, No Stock 0 217.60 2020-08-17
4 1582 MS-OPEN VALUE SUBSCRIPTION 077-05318 BA0080 NaN 1YR ACCESS ALNG LICSAPK OLV D, AP No Stock 0 56.09 2020-08-17
print('Number of Rows and Columns in dataset',df.shape,'.')
Number of Rows and Columns in dataset (2627926, 10) .
print('Data Type of each column:')
df.dtypes
Data Type of each column:





category                    int64
manufacturer               object
part_number                object
sku                        object
alt_sku                    object
description                object
status                     object
qty                         int64
price                     float64
sku_create_date    datetime64[ns]
dtype: object
print('Four type of columns in dataset.')
df.dtypes.value_counts()
Four type of columns in dataset.





object            6
int64             2
float64           1
datetime64[ns]    1
dtype: int64

Concise Summary of a DataFrame

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2627926 entries, 0 to 2627925
Data columns (total 10 columns):
 #   Column           Dtype         
---  ------           -----         
 0   category         int64         
 1   manufacturer     object        
 2   part_number      object        
 3   sku              object        
 4   alt_sku          object        
 5   description      object        
 6   status           object        
 7   qty              int64         
 8   price            float64       
 9   sku_create_date  datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(2), object(6)
memory usage: 200.5+ MB

Cleaning of dataset for further analysis

df = df.astype({"category":'category'}) # changing type of category.

Checking null type of data in dataframe

df.isnull().sum()
category                 0
manufacturer             0
part_number              0
sku                      0
alt_sku            2613442
description              0
status                   0
qty                      0
price                    0
sku_create_date          0
dtype: int64

lot of null value in alt_sku so better to drop it for further analysis.

Droping extra columns

df.drop(['sku','alt_sku','description'], axis=1, inplace=True)
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer part_number status qty price sku_create_date
0 1539 MS-OPEN VALUE SUBSCRIPTION 021-09108 No Stock 0 143.58 2020-08-17
1 1539 MS-OPEN VALUE SUBSCRIPTION 059-07014 No Stock 0 56.09 2020-08-17
2 1539 MS-OPEN VALUE SUBSCRIPTION 065-06444 No Stock 0 56.09 2020-08-17
3 1539 MS-OPEN VALUE SUBSCRIPTION 076-04355 No Stock 0 217.60 2020-08-17
4 1582 MS-OPEN VALUE SUBSCRIPTION 077-05318 No Stock 0 56.09 2020-08-17

Renaming columns for convience

df = df.rename(columns={'sku_create_date': 'date', 'part_number': 'product'})
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product status qty price date
0 1539 MS-OPEN VALUE SUBSCRIPTION 021-09108 No Stock 0 143.58 2020-08-17
1 1539 MS-OPEN VALUE SUBSCRIPTION 059-07014 No Stock 0 56.09 2020-08-17
2 1539 MS-OPEN VALUE SUBSCRIPTION 065-06444 No Stock 0 56.09 2020-08-17
3 1539 MS-OPEN VALUE SUBSCRIPTION 076-04355 No Stock 0 217.60 2020-08-17
4 1582 MS-OPEN VALUE SUBSCRIPTION 077-05318 No Stock 0 56.09 2020-08-17

Dividing Dataset on basis of availablity of stock.

print('Stock availablity:')
df['status'].value_counts()
Stock availablity:





No Stock    1914138
In Stock     713788
Name: status, dtype: int64
df_st= df[df.status == 'In Stock']
df_nos= df[df.status == 'No Stock']

For data In Stock:

df_st.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product status qty price date
253 9920 CASE LOGIC-PERSONAL & PORTABLE 3200943 In Stock 7 99.00 2020-08-17
271 531 OMNITRON SYSTEMS 7206-0 In Stock 2 59.09 2020-08-17
320 9070 ZEBRA ENTERPRISE MCD-A1 SAC5070-800CR In Stock 3 500.06 2020-08-17
324 9080 ZEBRA ENTERPRISE MCD-A1 KT-CLMPT-RS507-01R In Stock 115 86.34 2020-08-17
326 9080 ZEBRA ENTERPRISE MCD-A1 KT-PAD-RS507-10R In Stock 42 51.80 2020-08-17
df_st.drop(["status"], axis=1) # no need of stock column now so dropping
df_st
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product qty price date
253 9920 CASE LOGIC-PERSONAL & PORTABLE 3200943 7 99.00 2020-08-17
271 531 OMNITRON SYSTEMS 7206-0 2 59.09 2020-08-17
320 9070 ZEBRA ENTERPRISE MCD-A1 SAC5070-800CR 3 500.06 2020-08-17
324 9080 ZEBRA ENTERPRISE MCD-A1 KT-CLMPT-RS507-01R 115 86.34 2020-08-17
326 9080 ZEBRA ENTERPRISE MCD-A1 KT-PAD-RS507-10R 42 51.80 2020-08-17
... ... ... ... ... ... ...
2627921 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28
2627922 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28
2627923 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28
2627924 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28
2627925 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28

713788 rows × 6 columns

1. Which are the top Manufacturers with price, quantity?

mfg_grp = df_st.groupby('manufacturer').sum()
mfg_sort = mfg_grp.sort_values(by=['qty', 'price'],ascending= False)[:10]
mfg_sort_in = mfg_sort.reset_index()
mfg_sort_in
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
manufacturer qty price
0 STARTECH.COM 16069681 7266737.29
1 DELL CSG SERVICE WARRANTIES 13578833 150917.91
2 TRIPP LITE CONNECTIVITY 8283765 768146.62
3 LOGITECH - COMPUTER ACCESSORIES 8251159 1005042.67
4 JABRA BUSINESS 4829490 392963.51
5 PLANTRONICS INC 3825984 703634.71
6 CONFIG 6 3825896 2140.78
7 BELKIN - CABLES 3508335 262931.76
8 DELL ESG WARRANTIES 3149689 173018.32
9 EPOS 3127008 1127406.70
sns.set(rc={'figure.figsize':(11.7,8.27)})
top_mfg_qty_price = sns.barplot(x = 'qty',y = 'price',hue="manufacturer",  data= mfg_sort_in).set(title='Top Manufacturers with Price, Quantity')

png

2. Which are the most profitable products?

df_st['Total_Price'] =df_st['qty']*df_st['price']
C:\Users\Mohit\AppData\Local\Temp/ipykernel_7272/1120783067.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_st['Total_Price'] =df_st['qty']*df_st['price']
df_st
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product qty price date Total_Price
253 9920 CASE LOGIC-PERSONAL & PORTABLE 3200943 7 99.00 2020-08-17 693.00
271 531 OMNITRON SYSTEMS 7206-0 2 59.09 2020-08-17 118.18
320 9070 ZEBRA ENTERPRISE MCD-A1 SAC5070-800CR 3 500.06 2020-08-17 1500.18
324 9080 ZEBRA ENTERPRISE MCD-A1 KT-CLMPT-RS507-01R 115 86.34 2020-08-17 9929.10
326 9080 ZEBRA ENTERPRISE MCD-A1 KT-PAD-RS507-10R 42 51.80 2020-08-17 2175.60
... ... ... ... ... ... ... ...
2627921 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627922 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627923 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627924 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627925 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36

713788 rows × 7 columns

pr_tot_grp = df_st.groupby(['product','Total_Price'])
top_sort = pr_tot_grp.sum().sort_values(by='Total_Price',ascending=[False])[:10]
most_profitable_products = top_sort.reset_index()
profit_product = sns.barplot(x = 'product',y = 'Total_Price',data = most_profitable_products).set(title='most_profitables_products')

png

3. Which is the product where you have seen a decrease in the price?

df_st
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product qty price date Total_Price
253 9920 CASE LOGIC-PERSONAL & PORTABLE 3200943 7 99.00 2020-08-17 693.00
271 531 OMNITRON SYSTEMS 7206-0 2 59.09 2020-08-17 118.18
320 9070 ZEBRA ENTERPRISE MCD-A1 SAC5070-800CR 3 500.06 2020-08-17 1500.18
324 9080 ZEBRA ENTERPRISE MCD-A1 KT-CLMPT-RS507-01R 115 86.34 2020-08-17 9929.10
326 9080 ZEBRA ENTERPRISE MCD-A1 KT-PAD-RS507-10R 42 51.80 2020-08-17 2175.60
... ... ... ... ... ... ... ...
2627921 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627922 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627923 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627924 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627925 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36

713788 rows × 7 columns

pric_01 = df_st.drop(['manufacturer', 'category','qty',"Total_Price"], axis=1)
pric_01.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 713788 entries, 253 to 2627925
Data columns (total 3 columns):
 #   Column   Non-Null Count   Dtype         
---  ------   --------------   -----         
 0   product  713788 non-null  object        
 1   price    713788 non-null  float64       
 2   date     713788 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 21.8+ MB
p_max = pric_01.loc[pric_01.groupby('product').date.idxmax(),:]
p_min = pric_01.loc[pric_01.groupby('product').date.idxmin(),:]
p_max['price_max'] =p_max['price']
p_min['price_min'] =p_min['price'] 
p_max
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
product price date price_max
1728303 0-00496 59.62 2020-08-24 59.62
463475 0-00498 96.65 2020-08-17 96.65
2098347 003-000884-01-OE 398.68 2020-08-24 398.68
1438732 00385 1.80 2020-08-24 1.80
1438760 00388 2.48 2020-08-24 2.48
... ... ... ... ...
1731234 ZT62062-T01A100Z 4001.17 2020-08-24 4001.17
2356040 ZT62063-T010100Z 3797.41 2020-08-25 3797.41
1731241 ZT62063-T01A100Z 4140.69 2020-08-24 4140.69
1731246 ZT62063-T110100Z 4344.44 2020-08-24 4344.44
1741603 ZUCUBJCK 16.74 2020-08-24 16.74

30672 rows × 4 columns

p_min
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
product price date price_min
463476 0-00496 59.62 2020-08-17 59.62
463475 0-00498 96.65 2020-08-17 96.65
834073 003-000884-01-OE 398.68 2020-08-17 398.68
173447 00385 1.80 2020-08-17 1.80
173475 00388 2.48 2020-08-17 2.48
... ... ... ... ...
466408 ZT62062-T01A100Z 4001.17 2020-08-17 4001.17
466414 ZT62063-T010100Z 3797.41 2020-08-17 3797.41
466415 ZT62063-T01A100Z 4140.69 2020-08-17 4140.69
466420 ZT62063-T110100Z 4344.44 2020-08-17 4344.44
476823 ZUCUBJCK 16.74 2020-08-17 16.74

30672 rows × 4 columns

p_max = p_max.drop(['price','date'], axis=1)
p_max
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
product price_max
1728303 0-00496 59.62
463475 0-00498 96.65
2098347 003-000884-01-OE 398.68
1438732 00385 1.80
1438760 00388 2.48
... ... ...
1731234 ZT62062-T01A100Z 4001.17
2356040 ZT62063-T010100Z 3797.41
1731241 ZT62063-T01A100Z 4140.69
1731246 ZT62063-T110100Z 4344.44
1741603 ZUCUBJCK 16.74

30672 rows × 2 columns

p_min = p_min.drop(['price','date'], axis=1)
p_min
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
product price_min
463476 0-00496 59.62
463475 0-00498 96.65
834073 003-000884-01-OE 398.68
173447 00385 1.80
173475 00388 2.48
... ... ...
466408 ZT62062-T01A100Z 4001.17
466414 ZT62063-T010100Z 3797.41
466415 ZT62063-T01A100Z 4140.69
466420 ZT62063-T110100Z 4344.44
476823 ZUCUBJCK 16.74

30672 rows × 2 columns

final = pd.merge(p_max, p_min , on='product', how='outer')
final['final_price']= final['price_max']-final['price_min']
pric_Dec = final.sort_values(by=['final_price'], ascending=True ,ignore_index=True)[0:10]
sns.barplot(y = 'product',
            x = 'final_price',
            data = pric_Dec).set(title='Product whose Price Decreased')
[Text(0.5, 1.0, 'Product whose Price Decreased')]

png

4. Which is the product where you have seen an increase in the price?

pric_Inc =final.sort_values(by=['final_price'], ascending=False ,ignore_index=True)[0:10]
bar = sns.barplot(y = 'product',
            x = 'final_price',
            data = pric_Inc).set(title='Product whose Price Increased')

png

5. Which is the top category by price and quantity?

cat_grp = df_st.groupby('category').sum()
cat_grp.drop(['Total_Price'], axis=1, inplace=True)
cat_grp_srt_q_p = cat_grp.sort_values(by=['qty', 'price'],ascending= False)[:10]
cat_grp_srt_q_p_ind = cat_grp_srt_q_p.reset_index()
cat_grp_srt_q_p_ind
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category qty price
0 1221 17378795 346527.07
1 2220 12851216 1741955.11
2 9970 8618185 1389155.66
3 2240 8264370 1520556.20
4 3350 7068656 580035.13
5 9995 5015339 970241.61
6 303 4902118 4544824.27
7 2201 4715114 558368.49
8 1251 4347126 13541.88
9 101 3877634 11079362.36

6. Which products are aging?

df_st
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product qty price date Total_Price
253 9920 CASE LOGIC-PERSONAL & PORTABLE 3200943 7 99.00 2020-08-17 693.00
271 531 OMNITRON SYSTEMS 7206-0 2 59.09 2020-08-17 118.18
320 9070 ZEBRA ENTERPRISE MCD-A1 SAC5070-800CR 3 500.06 2020-08-17 1500.18
324 9080 ZEBRA ENTERPRISE MCD-A1 KT-CLMPT-RS507-01R 115 86.34 2020-08-17 9929.10
326 9080 ZEBRA ENTERPRISE MCD-A1 KT-PAD-RS507-10R 42 51.80 2020-08-17 2175.60
... ... ... ... ... ... ... ...
2627921 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627922 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627923 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627924 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36
2627925 9945 VERBATIM CORPORATION 93975 12 50.53 2020-08-28 606.36

713788 rows × 7 columns

qty_01 = df_st.drop(['manufacturer', 'category','price','Total_Price'], axis=1)
q_max = qty_01.loc[qty_01.groupby('qty').date.idxmax(),:]
qty_Inc =q_max.sort_values(by=['qty'], ascending=False ,ignore_index=True)[0:10]
sns.barplot(x = 'product',
            y = 'qty',
            data =qty_Inc ).set(title='Aged Product')
[Text(0.5, 1.0, 'Aged Product')]

png

7. How to Decrease Product aging?

For Data Not In Stock:

df_nos
df_nos.drop(["status"], axis=1) # no need of stock column now so dropping
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product qty price date
0 1539 MS-OPEN VALUE SUBSCRIPTION 021-09108 0 143.58 2020-08-17
1 1539 MS-OPEN VALUE SUBSCRIPTION 059-07014 0 56.09 2020-08-17
2 1539 MS-OPEN VALUE SUBSCRIPTION 065-06444 0 56.09 2020-08-17
3 1539 MS-OPEN VALUE SUBSCRIPTION 076-04355 0 217.60 2020-08-17
4 1582 MS-OPEN VALUE SUBSCRIPTION 077-05318 0 56.09 2020-08-17
... ... ... ... ... ... ...
2627418 3360 TARGUS AMB580TT 0 19.53 2020-08-28
2627775 505 ENGENIUS EAP1300 0 85.24 2020-08-28
2627822 9910 KINGSTON HX-MPFS-SM 0 13.75 2020-08-28
2627853 291 KINGSTON TECHNOLOGY FLASH SD10VG2/16GBCP 0 16.51 2020-08-28
2627857 291 KINGSTON IKD300S/4GB 0 52.83 2020-08-28

1914138 rows × 6 columns

1. Which are the top Manufacturers with price, quantity?

mfg_grp_n = df_nos.groupby('manufacturer').sum()

mfg_sort = mfg_grp_n.sort_values(by=['qty', 'price'],ascending= False)[:10]

mfg_sort_in_n = mfg_sort.reset_index()

mfg_sort_in_n

sns.set(rc={'figure.figsize':(11.7,8.27)})
top_mfg_qty_price_n = sns.barplot(x = 'qty',y = 'price',hue="manufacturer",  data= mfg_sort_in_n).set(title='Top Manufacturers with Price, Quantity')

png

2. Which are the most profitable products?

pr_tot_grp_n = df_nos.groupby(['product','price'])

top_sort_n = pr_tot_grp_n.sum().sort_values(by='price',ascending=[False])[:10]

most_profitable_products_n = top_sort_n.reset_index()

profit_product_n = sns.barplot(x = 'product',y = 'price',data = most_profitable_products_n).set(title='most_profitables_products')

png

df_nos
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category manufacturer product status qty price date
0 1539 MS-OPEN VALUE SUBSCRIPTION 021-09108 No Stock 0 143.58 2020-08-17
1 1539 MS-OPEN VALUE SUBSCRIPTION 059-07014 No Stock 0 56.09 2020-08-17
2 1539 MS-OPEN VALUE SUBSCRIPTION 065-06444 No Stock 0 56.09 2020-08-17
3 1539 MS-OPEN VALUE SUBSCRIPTION 076-04355 No Stock 0 217.60 2020-08-17
4 1582 MS-OPEN VALUE SUBSCRIPTION 077-05318 No Stock 0 56.09 2020-08-17
... ... ... ... ... ... ... ...
2627418 3360 TARGUS AMB580TT No Stock 0 19.53 2020-08-28
2627775 505 ENGENIUS EAP1300 No Stock 0 85.24 2020-08-28
2627822 9910 KINGSTON HX-MPFS-SM No Stock 0 13.75 2020-08-28
2627853 291 KINGSTON TECHNOLOGY FLASH SD10VG2/16GBCP No Stock 0 16.51 2020-08-28
2627857 291 KINGSTON IKD300S/4GB No Stock 0 52.83 2020-08-28

1914138 rows × 7 columns

3. Which is the product where you have seen a decrease in the price?

df_nos

pric_02 = df_nos.drop(['manufacturer', 'category','qty'], axis=1)

p_max_n = pric_02.loc[pric_02.groupby('product').date.idxmax(),:]

p_min_n = pric_02.loc[pric_02.groupby('product').date.idxmin(),:]

p_max_n['price_max'] =p_max_n['price']

p_min_n['price_min'] =p_min_n['price'] 


p_max_n = p_max_n.drop(['price','date'], axis=1)


p_min_n = p_min_n.drop(['price','date'], axis=1)



final_n = pd.merge(p_max_n, p_min_n , on='product', how='outer')

final_n['final_price']= final_n['price_max']-final_n['price_min']

pric_Dec_n = final_n.sort_values(by=['final_price'], ascending=True ,ignore_index=True)[0:10]
sns.barplot(y = 'product',
            x = 'final_price',
            data = pric_Dec_n).set(title='Product whose Price Decreased')
[Text(0.5, 1.0, 'Product whose Price Decreased')]

png

4. Which is the product where you have seen an increase in the price?

pric_Inc_n =final_n.sort_values(by=['final_price'], ascending=False ,ignore_index=True)[0:10]

bar = sns.barplot(y = 'product',
            x = 'final_price',
            data = pric_Inc_n).set(title='Product whose Price Increased')

png

5. Which is the top category by price and quantity?

cat_grp_n = df_nos.groupby('category').sum()


cat_grp_srt_q_p_n = cat_grp_n.sort_values(by=['qty', 'price'],ascending= False)[:10]

cat_grp_srt_q_p_ind_n = cat_grp_srt_q_p_n.reset_index()

cat_grp_srt_q_p_ind_n
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
category qty price
0 1221 0 1.539977e+09
1 1241 0 6.036206e+08
2 1231 0 4.445039e+08
3 1569 0 3.213863e+08
4 525 0 2.871297e+08
5 1577 0 2.064920e+08
6 531 0 1.861489e+08
7 661 0 1.643297e+08
8 101 0 1.282264e+08
9 1582 0 1.120179e+08