Merge pull request #10 from gp1981/standardized_report_enhanced

Standardized Balancesheet Function
gp1981 · Dec 19, 2023 · d2d8e4e · d2d8e4e
2 parents 8af96fb + 229c2fe
commit d2d8e4e
Show file tree

Hide file tree

Showing 10 changed files with 215 additions and 115 deletions.
diff --git a/code/Functions/data_retrieval.R b/code/Functions/data_retrieval.R
diff --git a/code/Functions/data_visualization.R b/code/Functions/data_visualization.R
@@ -1,23 +1,3 @@
 # Author: gp1981
 # Purpose: Contains the script for processing, analyzing, and visualizing SEC data.
 # Disclaimer: This script is intended for educational purposes only and should not be used for investment decisions. Use at your own risk.
-
-# Function to unnest list company_Facts ----------------------------------
-
-# Un-nest the company_Facts (and nested unit list)
-FactsList_to_Dataframe <- function(company_Facts_us_gaap) {
-  df_Facts <- company_Facts_us_gaap %>%
-    tibble() %>%
-    unnest_wider(col = everything()) %>%
-    unnest(cols = c(units)) %>%
-    unnest(cols = c(units))
-
-  df_Facts <- as.data.frame(df_Facts)
-
-  # Mutate to reduce values in millions by dividing by 1 million
-  df_Facts <- df_Facts %>%
-    mutate(val = val / 1e6)
-
-  return(df_Facts)
-}
-
diff --git a/data/standardized_balancesheet.xlsx b/data/standardized_balancesheet.xlsx
diff --git a/data/standardized_incomestatement.xlsx b/data/standardized_incomestatement.xlsx
diff --git a/quarto/01_data_retrieval.qmd b/quarto/01_data_retrieval.qmd
@@ -42,9 +42,9 @@ company_List <- retrieve_Company_List(headers)
 kable(head(company_List), "html", class = "custom-table custom-narrow-table")
 ```
 
-## Selecting a Company: JAKKS Pacific Inc. (AAPL)
+## Selecting a Company: JAKKS Pacific Inc. (JAKK)
 
-For our analysis, we'll use JAKKS Pacific Inc. (AAPL) as the company of interest. The CIK for JAKKS Pacific Inc. is 0000320193.
+For our analysis, we'll use JAKKS Pacific Inc. (JAKK) as the company of interest. The CIK for JAKKS Pacific Inc. is 0000320193.
 
 Let's now select JAKKS Pacific Inc. by its CIK and retrieve its data from the SEC. The data we retrieve will be stored in the `company_Data` object for further analysis:
 
@@ -53,8 +53,10 @@ Let's now select JAKKS Pacific Inc. by its CIK and retrieve its data from the SE
 cik <- "0001009829"  # CIK for JAKKS Pacific Inc.
 company_Data <- retrieve_Company_Data(headers, cik)
 
-# this the first row of the company list as a quick check
-company_List[1,] %>% kable("html", class = "custom-table custom-narrow-table")
+# this the corresponding row of the company list
+company_List %>% 
+  filter(cik_str == cik) %>% 
+  kable("html", class = "custom-table custom-narrow-table")
 ```
 
 By following these steps, we've imported the necessary libraries, sourced relevant files, and initiated the retrieval of financial data from the SEC. In the subsequent chapters, we will delve deeper into exploring and analyzing the SEC data for JAKKS Pacific Inc.

diff --git a/quarto/02_data_exploration.qmd b/quarto/02_data_exploration.qmd
@@ -56,16 +56,12 @@ str(company_Data, max.level = 1)
 The output reveals that **`company_Data`** structure comprises three lists with nested lists, such as **`company_Metadata`** with 22 lists.
 
 ::: {.Note .custom-note}
-------------------------------------------------------------------------
-**Note**
-
-In R, a **list** is a powerful data structure that can hold elements of **different data types**. This flexibility allows each element to be unique and cater to specific data requirements.
-For the `company_Data` object, the list structure plays a crucial role in organizing various pieces of information, including financial data, company descriptions, and filing details. This heterogeneous nature of the data necessitates a flexible data structure like a list to accommodate these diverse data types.
-Understanding the **organization and structure** of the `company_Data` list is essential for effective navigation and extraction of specific information. By grasping the relationships between the list elements, users can efficiently retrieve the desired data elements for analysis and interpretation.
-------------------------------------------------------------------------
+|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|:-----------------------------------------------------------------------|
+| **Note**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+| In R, a **list** is a powerful data structure that can hold elements of **different data types**. This flexibility allows each element to be unique and cater to specific data requirements. For the `company_Data` object, the list structure plays a crucial role in organizing various pieces of information, including financial data, company descriptions, and filing details. This heterogeneous nature of the data necessitates a flexible data structure like a list to accommodate these diverse data types. Understanding the **organization and structure** of the `company_Data` list is essential for effective navigation and extraction of specific information. By grasping the relationships between the list elements, users can efficiently retrieve the desired data elements for analysis and interpretation. |
 :::
 
-
 To access the lists we use the symbol `$` in after the object e.g. `company_Data$company_Metadata`.
 
 Next, we split the `company_Data` into separate lists: **`company_Metadata`**, **`company_Facts`**, **`company_Concept`**.
@@ -88,7 +84,14 @@ Let's start with `company_Metadata` which includes 22 elements: characters, inte
 str(company_Metadata, max.level = 1)
 ```
 
-For our purpose, the most relevant information of `company_Metadata` are included in the elemnent `filing` which is a nested list and contains the filing attributes. Here the structure of `company_Metadata`:
+For our purpose, the most relevant information of `company_Metadata` are:
+
+1.  `cik`, as described above.
+2.  `sic`, standard industrial classification. The SIC codes were used to classify companies into specific industry segments based on their primary business activities. Each four-digit SIC code represented a different industry or sector.
+3.  `sicDescription` the description of the standard industrial classification
+4.  `name`, name of the company 5 `tickers`, identifier a publicly traded company's stock on a particular stock market
+5.  `filing`  which includes the filing attributes. 
+Here the structure of `company_Metadata`:
 
 ```{r str_company_Metadata_Filing, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
 # Visualize structure of the company_Metadata
@@ -97,7 +100,7 @@ str(company_Metadata$filing, max.level = 2)
 
 The format of the dataset as printed is not very useful. We see that there are useful information on the forms (e.g. 10K) and dates (e.g. filing dates).
 
-For now we will keep it as is and we will come back later on how to improve the readibility
+For now we will keep it as is and we will come back later on how to improve the readability
 
 #### Company Facts
 
@@ -115,15 +118,15 @@ For our purpose, the last element `company_Facts$facts` is the most relevant one
 str(company_Facts$facts, max.level = 1)
 ```
 
-The **`us-gaap`** list includes relevant Facts, containing 498 nested elements. Let's examine the first five.
+The **`us-gaap`** list includes relevant Facts, containing 487 nested elements. Let's examine the first five.
 
 ```{r str_company_Facts_us_gaap_head, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
 
 # Visualize structure of the company_Facts
 Facts_us_gaap <- str(company_Facts$facts$`us-gaap`[1:5], max.level = 1)
 ```
 
-These elements include essential fundamentals of the company and are themselves nested lists. Let's explore the structure of the first one.
+These elements, to which we will refer as `us_gaap_reference`, include essential fundamentals of the company and are themselves nested lists. Let's explore the structure of the first one.
 
 ```{r str_company_Facts_us_gaap, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
 
@@ -163,7 +166,7 @@ This hierarchical structure provides a detailed view of the financial concept "A
 
 ------------------------------------------------------------------------
 
-*Most of the SEC data required for fundamentals analysis is included in a structure of nested lists. Its understanding is critical to properly retrieve the data.*
+*Most of the SEC data required for fundamentals analysis is included in a structure of nested lists. We will see in the next section how to properly retrieve this data.*
 
 ------------------------------------------------------------------------
 :::
@@ -178,3 +181,5 @@ str(company_Concept, max.level = 3)
 ```
 
 The output shows the structure associated with the `Asset` of the company under the taxonomy of `us-gaap`.
+
+For the purpose of our analysis, we will use `company_Facts` which includes also the `us_gaap_reference`.
diff --git a/quarto/03_data_analysis.qmd b/quarto/03_data_analysis.qmd
@@ -23,7 +23,7 @@ company_Concept <- company_Data$company_Concept
 
 The use of specific labels to indicate financial report items is regulated by accounting standards set forth by authoritative bodies. In the United States, the Financial Accounting Standards Board (FASB) establishes generally accepted accounting principles (GAAP), which provide guidelines for the preparation of financial statements, including the standardization of labels and concepts.
 
-Different comapnies may use different reporting styles to indicate Facts in different ways. The use of different labels however affect our ability to efficiently retrieve the appropriate financial data.
+Different companies may use different reporting styles to indicate Facts in different ways. The use of different labels however affect our ability to efficiently retrieve the appropriate financial data.
 
 The objective here is to generate standardized financial reports of Balance sheet, Income Statement, Cash Flow that so that we can use the same label to perform specific calculation across all companies.
 
@@ -33,13 +33,13 @@ Let's create now a dataframe including the data retrieved from `` company_Facts$
 
 The first step is un-nest the dataset, in particular the financial data, in `company_Facts`, nested in various sub-lists.
 
-The following code can be used to un-nest the list within `company_Facts` and create a dataframe easy to visualize and useful for our purpose[^02_data_exploration-2].
+The following code can be used to un-nest the list within `company_Facts` and create a dataframe easy to visualize and useful for our purpose[^03_data_analysis-1].
 
-[^02_data_exploration-2]: For visualization purpose we have omitted from the printed table below the variables (columns) `df_Facts$description` and `df_Facts$accn`.
+[^03_data_analysis-1]: For visualization purpose we have omitted from the printed table below the variables (columns) `df_Facts$description` and `df_Facts$accn`.
 
 ```{r unnesting_company_Facts, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-table"}
 
-df_Facts <- FactsList_to_Dataframe(company_Facts$facts$`us-gaap`)
+df_Facts <- Fundamentals_to_Dataframe(company_Data)
 
 # Select the columns to print out and present the output with wrapped text and formatted numbers
 df_Facts %>% select(-c(description,accn)) %>% 
@@ -52,7 +52,7 @@ df_Facts %>% select(-c(description,accn)) %>%
 
 The data in this dataframe is organized to facilitate analysis and comparison of financial information over different periods. Each row corresponds to a specific financial Concept (e.g. Accounts Payable) reported by the company, and the columns provide details about the reporting period, values, and other relevant attributes.
 
-To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific Inc., we need to extract the values of the various Concepts and corresponding date in `df_Fact$end`.
+To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific Inc., we need to extract the values in `df_Fact$val` of the various Concepts and corresponding date in `df_Fact$end`.
 
 `df_Fact` has a long list of Concepts for multiple fiscal periods. The following code shows the large number of Concepts included in `df_Fact`, related to all financial reports.
 
@@ -62,7 +62,8 @@ To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific I
 cat("df_Facts includes:", format(nrow(df_Facts), units = "auto"), "records \n")
 ```
 
-From the extract of the `df_Facts` above, we see repeated Concepts. Each of them is associated with a different fiscal period. The code next provides number of unique Concepts used historically JAKKS Pacific Inc. in their financial reports.
+From an extract of the `df_Facts` above, we would see repeated Concepts. Each of them is associated with a different fiscal period. The code next provides number of unique Concepts used historically JAKKS Pacific Inc. in their financial reports.
+
 ```{r summarize_Facts3, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
 # Print the number of unique records of df_Facts
 df_Facts_distinct <- df_Facts %>% select(label,description) %>% distinct()
@@ -96,26 +97,24 @@ As mentioned in the previous chapter, the `df_Facts` dataframe contains financia
 
 To handle this scenario, we construct a dataframe of financial data based on the end date (`df_Facts$end`) of the reporting period and remove the remaining attributes such as `fy`, `fp`, etc. This ensures that we focus on the actual reporting period for the financial data.
 
-
 ------------------------------------------------------------------------
 :::
 
-
 #### Balance Sheet
 
-The following code will create a standardized Balance Sheet (`df_std_BS`) based on a matching table prepopulated in excel (standardized_balancehseet.xlsx).
-There are instances in which the the filing (e.g. 10-K) of a specific fiscal period (`df_Facts$fy` and `df_Facts$fp`) includes a comparison with previous fiscal periods. In these cases we refer to the end date `df_Facts$end` of the reporting period which is associated with the financial data. 
+The following code will create a standardized Balance Sheet (`df_std_BS`) based on a matching table in excel (standardized_balancehseet.xlsx). The standardized Balance Sheet includes Concepts that are not necessarily present in the filing (e.g. 10-K) of all companies. In these cases the function will estimate their values. More details on the function are included in [Appendix A2](#appendix-a2-r-script-functions)
+
 
 ```{r standardized_BS, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-table"}
 
 # Retrieve balance sheet of JAKKS Pacific Inc. in standardized format
 df_std_BS <- bs_std(df_Facts)
 
-# Print the resulting data.frame
+# Print the standardized balancesheet
 df_std_BS %>% head() %>% as.data.frame() %>%
-  head() %>% as.data.frame() %>% 
-  kable( "html") %>% 
-  kable_styling(full_width = FALSE) %>% 
+  head() %>% as.data.frame() %>%
+  kable( "html") %>%
+  kable_styling(full_width = FALSE) %>%
   column_spec(1,width = "250px")
 
 
@@ -127,4 +126,4 @@ df_std_BS %>% head() %>% as.data.frame() %>%
 
 #### Cash Flow
 
-(work in progress ...)
+(work in progress ...)
diff --git a/quarto/A1_main_script.qmd b/quarto/A1_main_script.qmd
@@ -1,3 +1,3 @@
-# R script: Main Script
+# Appendix A1: R Main script
 
 Work in progress [..]
diff --git a/quarto/A2_Functions.qmd b/quarto/A2_Functions.qmd
@@ -1,3 +1,3 @@
-# R script: Functions
+# Appendix A2: R script functions
 
 Work in progress [..]
diff --git a/quarto/setup.qmd b/quarto/setup.qmd
@@ -1,5 +1,5 @@
 # Import required libraries
-packages <- c("tm", "proxy","httr","jsonlite","tidyverse", "readxl","magrittr", "kableExtra" ,"tibble","knitr", "here","openxlsx")
+packages <- c("tm", "proxy","httr","jsonlite","tidyverse", "readxl","magrittr", "kableExtra" ,"tibble","knitr", "here","openxlsx", "furrr")
 
 for (package in packages) {
   if (!(package %in% installed.packages())) {