Skip to content

Commit

Permalink
Merge pull request #10 from gp1981/standardized_report_enhanced
Browse files Browse the repository at this point in the history
Standardized Balancesheet Function
  • Loading branch information
gp1981 authored Dec 19, 2023
2 parents 8af96fb + 229c2fe commit d2d8e4e
Show file tree
Hide file tree
Showing 10 changed files with 215 additions and 115 deletions.
234 changes: 174 additions & 60 deletions code/Functions/data_retrieval.R

Large diffs are not rendered by default.

20 changes: 0 additions & 20 deletions code/Functions/data_visualization.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,3 @@
# Author: gp1981
# Purpose: Contains the script for processing, analyzing, and visualizing SEC data.
# Disclaimer: This script is intended for educational purposes only and should not be used for investment decisions. Use at your own risk.

# Function to unnest list company_Facts ----------------------------------

# Un-nest the company_Facts (and nested unit list)
FactsList_to_Dataframe <- function(company_Facts_us_gaap) {
df_Facts <- company_Facts_us_gaap %>%
tibble() %>%
unnest_wider(col = everything()) %>%
unnest(cols = c(units)) %>%
unnest(cols = c(units))

df_Facts <- as.data.frame(df_Facts)

# Mutate to reduce values in millions by dividing by 1 million
df_Facts <- df_Facts %>%
mutate(val = val / 1e6)

return(df_Facts)
}

Binary file modified data/standardized_balancesheet.xlsx
Binary file not shown.
Binary file modified data/standardized_incomestatement.xlsx
Binary file not shown.
10 changes: 6 additions & 4 deletions quarto/01_data_retrieval.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ company_List <- retrieve_Company_List(headers)
kable(head(company_List), "html", class = "custom-table custom-narrow-table")
```

## Selecting a Company: JAKKS Pacific Inc. (AAPL)
## Selecting a Company: JAKKS Pacific Inc. (JAKK)

For our analysis, we'll use JAKKS Pacific Inc. (AAPL) as the company of interest. The CIK for JAKKS Pacific Inc. is 0000320193.
For our analysis, we'll use JAKKS Pacific Inc. (JAKK) as the company of interest. The CIK for JAKKS Pacific Inc. is 0000320193.

Let's now select JAKKS Pacific Inc. by its CIK and retrieve its data from the SEC. The data we retrieve will be stored in the `company_Data` object for further analysis:

Expand All @@ -53,8 +53,10 @@ Let's now select JAKKS Pacific Inc. by its CIK and retrieve its data from the SE
cik <- "0001009829" # CIK for JAKKS Pacific Inc.
company_Data <- retrieve_Company_Data(headers, cik)
# this the first row of the company list as a quick check
company_List[1,] %>% kable("html", class = "custom-table custom-narrow-table")
# this the corresponding row of the company list
company_List %>%
filter(cik_str == cik) %>%
kable("html", class = "custom-table custom-narrow-table")
```

By following these steps, we've imported the necessary libraries, sourced relevant files, and initiated the retrieval of financial data from the SEC. In the subsequent chapters, we will delve deeper into exploring and analyzing the SEC data for JAKKS Pacific Inc.
Expand Down
31 changes: 18 additions & 13 deletions quarto/02_data_exploration.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,12 @@ str(company_Data, max.level = 1)
The output reveals that **`company_Data`** structure comprises three lists with nested lists, such as **`company_Metadata`** with 22 lists.

::: {.Note .custom-note}
------------------------------------------------------------------------
**Note**

In R, a **list** is a powerful data structure that can hold elements of **different data types**. This flexibility allows each element to be unique and cater to specific data requirements.
For the `company_Data` object, the list structure plays a crucial role in organizing various pieces of information, including financial data, company descriptions, and filing details. This heterogeneous nature of the data necessitates a flexible data structure like a list to accommodate these diverse data types.
Understanding the **organization and structure** of the `company_Data` list is essential for effective navigation and extraction of specific information. By grasping the relationships between the list elements, users can efficiently retrieve the desired data elements for analysis and interpretation.
------------------------------------------------------------------------
| |
|:-----------------------------------------------------------------------|
| **Note** |
| In R, a **list** is a powerful data structure that can hold elements of **different data types**. This flexibility allows each element to be unique and cater to specific data requirements. For the `company_Data` object, the list structure plays a crucial role in organizing various pieces of information, including financial data, company descriptions, and filing details. This heterogeneous nature of the data necessitates a flexible data structure like a list to accommodate these diverse data types. Understanding the **organization and structure** of the `company_Data` list is essential for effective navigation and extraction of specific information. By grasping the relationships between the list elements, users can efficiently retrieve the desired data elements for analysis and interpretation. |
:::


To access the lists we use the symbol `$` in after the object e.g. `company_Data$company_Metadata`.

Next, we split the `company_Data` into separate lists: **`company_Metadata`**, **`company_Facts`**, **`company_Concept`**.
Expand All @@ -88,7 +84,14 @@ Let's start with `company_Metadata` which includes 22 elements: characters, inte
str(company_Metadata, max.level = 1)
```

For our purpose, the most relevant information of `company_Metadata` are included in the elemnent `filing` which is a nested list and contains the filing attributes. Here the structure of `company_Metadata`:
For our purpose, the most relevant information of `company_Metadata` are:

1. `cik`, as described above.
2. `sic`, standard industrial classification. The SIC codes were used to classify companies into specific industry segments based on their primary business activities. Each four-digit SIC code represented a different industry or sector.
3. `sicDescription` the description of the standard industrial classification
4. `name`, name of the company 5 `tickers`, identifier a publicly traded company's stock on a particular stock market
5. `filing` which includes the filing attributes.
Here the structure of `company_Metadata`:

```{r str_company_Metadata_Filing, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Metadata
Expand All @@ -97,7 +100,7 @@ str(company_Metadata$filing, max.level = 2)

The format of the dataset as printed is not very useful. We see that there are useful information on the forms (e.g. 10K) and dates (e.g. filing dates).

For now we will keep it as is and we will come back later on how to improve the readibility
For now we will keep it as is and we will come back later on how to improve the readability

#### Company Facts

Expand All @@ -115,15 +118,15 @@ For our purpose, the last element `company_Facts$facts` is the most relevant one
str(company_Facts$facts, max.level = 1)
```

The **`us-gaap`** list includes relevant Facts, containing 498 nested elements. Let's examine the first five.
The **`us-gaap`** list includes relevant Facts, containing 487 nested elements. Let's examine the first five.

```{r str_company_Facts_us_gaap_head, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Visualize structure of the company_Facts
Facts_us_gaap <- str(company_Facts$facts$`us-gaap`[1:5], max.level = 1)
```

These elements include essential fundamentals of the company and are themselves nested lists. Let's explore the structure of the first one.
These elements, to which we will refer as `us_gaap_reference`, include essential fundamentals of the company and are themselves nested lists. Let's explore the structure of the first one.

```{r str_company_Facts_us_gaap, eval=TRUE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
Expand Down Expand Up @@ -163,7 +166,7 @@ This hierarchical structure provides a detailed view of the financial concept "A

------------------------------------------------------------------------

*Most of the SEC data required for fundamentals analysis is included in a structure of nested lists. Its understanding is critical to properly retrieve the data.*
*Most of the SEC data required for fundamentals analysis is included in a structure of nested lists. We will see in the next section how to properly retrieve this data.*

------------------------------------------------------------------------
:::
Expand All @@ -178,3 +181,5 @@ str(company_Concept, max.level = 3)
```

The output shows the structure associated with the `Asset` of the company under the taxonomy of `us-gaap`.

For the purpose of our analysis, we will use `company_Facts` which includes also the `us_gaap_reference`.
29 changes: 14 additions & 15 deletions quarto/03_data_analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ company_Concept <- company_Data$company_Concept

The use of specific labels to indicate financial report items is regulated by accounting standards set forth by authoritative bodies. In the United States, the Financial Accounting Standards Board (FASB) establishes generally accepted accounting principles (GAAP), which provide guidelines for the preparation of financial statements, including the standardization of labels and concepts.

Different comapnies may use different reporting styles to indicate Facts in different ways. The use of different labels however affect our ability to efficiently retrieve the appropriate financial data.
Different companies may use different reporting styles to indicate Facts in different ways. The use of different labels however affect our ability to efficiently retrieve the appropriate financial data.

The objective here is to generate standardized financial reports of Balance sheet, Income Statement, Cash Flow that so that we can use the same label to perform specific calculation across all companies.

Expand All @@ -33,13 +33,13 @@ Let's create now a dataframe including the data retrieved from `` company_Facts$

The first step is un-nest the dataset, in particular the financial data, in `company_Facts`, nested in various sub-lists.

The following code can be used to un-nest the list within `company_Facts` and create a dataframe easy to visualize and useful for our purpose[^02_data_exploration-2].
The following code can be used to un-nest the list within `company_Facts` and create a dataframe easy to visualize and useful for our purpose[^03_data_analysis-1].

[^02_data_exploration-2]: For visualization purpose we have omitted from the printed table below the variables (columns) `df_Facts$description` and `df_Facts$accn`.
[^03_data_analysis-1]: For visualization purpose we have omitted from the printed table below the variables (columns) `df_Facts$description` and `df_Facts$accn`.

```{r unnesting_company_Facts, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-table"}
df_Facts <- FactsList_to_Dataframe(company_Facts$facts$`us-gaap`)
df_Facts <- Fundamentals_to_Dataframe(company_Data)
# Select the columns to print out and present the output with wrapped text and formatted numbers
df_Facts %>% select(-c(description,accn)) %>%
Expand All @@ -52,7 +52,7 @@ df_Facts %>% select(-c(description,accn)) %>%

The data in this dataframe is organized to facilitate analysis and comparison of financial information over different periods. Each row corresponds to a specific financial Concept (e.g. Accounts Payable) reported by the company, and the columns provide details about the reporting period, values, and other relevant attributes.

To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific Inc., we need to extract the values of the various Concepts and corresponding date in `df_Fact$end`.
To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific Inc., we need to extract the values in `df_Fact$val` of the various Concepts and corresponding date in `df_Fact$end`.

`df_Fact` has a long list of Concepts for multiple fiscal periods. The following code shows the large number of Concepts included in `df_Fact`, related to all financial reports.

Expand All @@ -62,7 +62,8 @@ To recreate a financial statement from the data in `df_Facts` of JAKKS Pacific I
cat("df_Facts includes:", format(nrow(df_Facts), units = "auto"), "records \n")
```

From the extract of the `df_Facts` above, we see repeated Concepts. Each of them is associated with a different fiscal period. The code next provides number of unique Concepts used historically JAKKS Pacific Inc. in their financial reports.
From an extract of the `df_Facts` above, we would see repeated Concepts. Each of them is associated with a different fiscal period. The code next provides number of unique Concepts used historically JAKKS Pacific Inc. in their financial reports.

```{r summarize_Facts3, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-str-output"}
# Print the number of unique records of df_Facts
df_Facts_distinct <- df_Facts %>% select(label,description) %>% distinct()
Expand Down Expand Up @@ -96,26 +97,24 @@ As mentioned in the previous chapter, the `df_Facts` dataframe contains financia

To handle this scenario, we construct a dataframe of financial data based on the end date (`df_Facts$end`) of the reporting period and remove the remaining attributes such as `fy`, `fp`, etc. This ensures that we focus on the actual reporting period for the financial data.


------------------------------------------------------------------------
:::


#### Balance Sheet

The following code will create a standardized Balance Sheet (`df_std_BS`) based on a matching table prepopulated in excel (standardized_balancehseet.xlsx).
There are instances in which the the filing (e.g. 10-K) of a specific fiscal period (`df_Facts$fy` and `df_Facts$fp`) includes a comparison with previous fiscal periods. In these cases we refer to the end date `df_Facts$end` of the reporting period which is associated with the financial data.
The following code will create a standardized Balance Sheet (`df_std_BS`) based on a matching table in excel (standardized_balancehseet.xlsx). The standardized Balance Sheet includes Concepts that are not necessarily present in the filing (e.g. 10-K) of all companies. In these cases the function will estimate their values. More details on the function are included in [Appendix A2](#appendix-a2-r-script-functions)


```{r standardized_BS, message=FALSE, warning=FALSE, message=FALSE, class.output="custom-table"}
# Retrieve balance sheet of JAKKS Pacific Inc. in standardized format
df_std_BS <- bs_std(df_Facts)
# Print the resulting data.frame
# Print the standardized balancesheet
df_std_BS %>% head() %>% as.data.frame() %>%
head() %>% as.data.frame() %>%
kable( "html") %>%
kable_styling(full_width = FALSE) %>%
head() %>% as.data.frame() %>%
kable( "html") %>%
kable_styling(full_width = FALSE) %>%
column_spec(1,width = "250px")
Expand All @@ -127,4 +126,4 @@ df_std_BS %>% head() %>% as.data.frame() %>%

#### Cash Flow

(work in progress ...)
(work in progress ...)
2 changes: 1 addition & 1 deletion quarto/A1_main_script.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# R script: Main Script
# Appendix A1: R Main script

Work in progress [..]
2 changes: 1 addition & 1 deletion quarto/A2_Functions.qmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# R script: Functions
# Appendix A2: R script functions

Work in progress [..]
2 changes: 1 addition & 1 deletion quarto/setup.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Import required libraries
packages <- c("tm", "proxy","httr","jsonlite","tidyverse", "readxl","magrittr", "kableExtra" ,"tibble","knitr", "here","openxlsx")
packages <- c("tm", "proxy","httr","jsonlite","tidyverse", "readxl","magrittr", "kableExtra" ,"tibble","knitr", "here","openxlsx", "furrr")

for (package in packages) {
if (!(package %in% installed.packages())) {
Expand Down

0 comments on commit d2d8e4e

Please sign in to comment.