Skip to content

Commit

Permalink
update writing styles, fix grammar or typos, and clarify informations.
Browse files Browse the repository at this point in the history
  • Loading branch information
glennhenry committed Aug 15, 2024
1 parent e8a7123 commit 02ddccc
Show file tree
Hide file tree
Showing 8 changed files with 43 additions and 33 deletions.
2 changes: 1 addition & 1 deletion docs/digital-media-processing/26-mp4/mp4.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: MP4

- **[MP4 file format — Wikipedia](https://en.wikipedia.org/wiki/MP4_file_format)**

**MP4 (MPEG-4 Part 14)** is a digital [multimedia container](<(/digital-media-processing/ogg-vorbis#media-container)>) format, commonly used to store audio and video data.
**MP4 (MPEG-4 Part 14)** is a digital [multimedia container](/digital-media-processing/ogg-vorbis#media-container) format, commonly used to store audio and video data.

![MP4 as a multimedia container](./mp4-container.png)
Source: https://www.filefix.org/format/mp4.html, https://en.wikipedia.org/wiki/MP4_file_format
Expand Down
6 changes: 4 additions & 2 deletions docs/digital-media-processing/28-json/json.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ description: JSON

- **[JSON — Wikipedia](https://en.wikipedia.org/wiki/JSON)**

**JSON (JavaScript Object Notation)** is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON is often used to transmit data between a server and a web application, as an alternative to XML. It is also used as a data storage and communication format in many programming languages and web services.
**JSON (JavaScript Object Notation)** is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to [parse](/compilers-and-programming-languages/parsing) and generate. JSON is often used to transmit data between a server and a web application, as an alternative to [XML](/digital-media-processing/xml). It is also used as a data storage and communication format in many programming languages and web services.

:::tip
JSON was derived from [JavaScript](/internet-and-web/javascript), but it is not limited to JavaScript only. In fact JSON is used as data exchange between different systems and programming languages.
JSON was derived from [JavaScript](/internet-and-web/javascript), but it is not limited to JavaScript only. In fact, JSON is used as data exchange between many different systems and programming languages.
:::

JSON is defined as text format that consists of key-value pairs, where keys are strings and values can be strings, numbers, objects, arrays, or Boolean values. The data is structured hastily, with nested objects and arrays.
Expand Down Expand Up @@ -45,3 +45,5 @@ JSON is defined as text format that consists of key-value pairs, where keys are
```

Source: [Wikipedia JSON example](https://en.wikipedia.org/wiki/JSON#Syntax)

The way we use it on application depends on how the programming language or library process it. For example, it is common for programming languages to parse the JSON data and then generate equivalent code in [OOP classes](/computer-and-programming-fundamentals/object-oriented-programming). This JSON example could be turned into a `Person` class with properties like `firstName` (string type), `lastName` (string type), `isAlive` (with boolean type), and so on. Another class could be generated for nested data like `phoneNumbers` has.
25 changes: 20 additions & 5 deletions docs/digital-media-processing/29-txt/txt.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,32 @@ description: TXT

- **[Text file — Wikipedia](https://en.wikipedia.org/wiki/Text_file)**

**txt** file format or a text file, is a very simple format used to store plain text data. It contains human-readable text without any specific formatting or metadata. The characters used can be encoded in [ASCII](/computer-and-programming-fundamentals/data-representation#ascii) or [UTF-8](/computer-and-programming-fundamentals/data-representation#unicode).
A **.txt** file format, or simply a text file, is a very simple format used to store plain text data. It contains human-readable text without any specific formatting or metadata. The characters can be encoded in various scheme, such as [ASCII](/computer-and-programming-fundamentals/data-representation#ascii) or [UTF-8](/computer-and-programming-fundamentals/data-representation#unicode).

The structure of txt file is very simple, it consists of sequence of line that vary in length, each line is terminated using a new line character (typically using enter button in keyboard).
The structure of a .txt file is very simple; it consists of a sequence of lines that vary in length, with each line terminated by a newline character (typically created by pressing the Enter key on a keyboard).

:::info
Different operating systems have different way to handle line endings. In Windows OS, a new line is represented as CRLF `\r\n`, while Linux uses just LF `\n` for new lines.
:::

### txt File Representation

txt file are stored in binary data encoded with specific character encoding scheme. txt file may also contain metadata for additional information such as file name, file size, creation date, last modified date, and file permissions. The specific metadata and where is it located depends on the operating system used.
.txt file are stored in binary data encoded with specific character encoding scheme. Typically, .txt file does not store metadata on the file itself. Any information about the file, is usually managed by the [file system](/operating-system/file-system) rather than being embedded in the file.

For example, consider a text file that uses [ASCII](/computer-and-programming-fundamentals/data-representation#ascii) encoding.

```txt
Hello, World!
```

In ASCII, the letter "H" is represented by the decimal value 72, "e" by 101, "l" by 108, "o" by 111, comma (",") by 44, space by 32, and so on. Each decimal value is then mapped into its ASCII code. Typically, we don't represent the ASCII code in raw binary, but rather in [hexadecimal format](/computer-and-programming-fundamentals/number-system#hexadecimal).

For example, consider a text file that uses [ASCII](/computer-and-programming-fundamentals/data-representation#ascii) encoding and contains "Hello, World!". In ASCII, the letter "H" is represented by the decimal value 72, "e" by 101, "l" by 108, "o" by 111, comma (",") by 44, space by 32, and so on. We can then transform each ASCII value of the text to binary.
```
hex: 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21
binary: 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01010111 01101111 01110010 01101100 01100100 00100001
```

While decoding it, we will reverse the process, for example if we encounter binary data of "01000001", this means it is "A". We will keep going until the last piece of binary.
The file system that handles it will reverse the process, turning the hexadecimal code into its corresponding ASCII symbols. However, discrepancy could occur when the text file is made and displayed in different operating systems. The OS may expect different encoding than ASCII, or it could use different conventions for line endings.

![TXT file in binary form](./txt-file-binary.gif)
Source: https://www.thecrazyprogrammer.com/2018/05/difference-between-text-file-and-binary-file.html
16 changes: 7 additions & 9 deletions docs/digital-media-processing/30-md/md.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,21 @@ description: MD

- **[Markdown — Wikipedia](https://en.wikipedia.org/wiki/Markdown)**

**Markdown (.md file)** is a markup language used to style plain text with format elements such as heading, list, images, bold, italic, underlined text, links, etc. Markdown can be easily converted into HTML, in fact it also supports HTML itself and CSS styling.
**.md** file format or Markdown is a markup language used to style plain text with format elements, such as heading, list, images, bold, italic, underlined text, hyperlinks, etc. Markdown can be easily converted into [HTML](/internet-and-web/html). In fact, it supports HTML and CSS styling within it.

Markdown uses a set of symbol that is used to indicate which part of the plain text to be formatted. For example, surrounding a text around double `*` like `**Hello**` will make the text bold. Using `#` will create a heading 1-6 based on the amount of `#` before the text, `### World` this will produce a "World" text with heading 3.

And there are many others symbol to format text.
Markdown uses a set of symbol to indicate which part of the plain text to be formatted. For example, surrounding a text around double `*` like `**Hello**` will make the text bold. Using `#` will create a heading based on the amount of `#`.

![Example of Markdown formatting plain text](./markdown-example.png)
Source: https://dev.to/developer_anand/learn-basic-markdown-33nl

### Markdown Parsing

Markdown works by analyzing plain text and converting it into HTML while also applying format according to the symbol used. Here is the simplified process:
Markdown files with its format element is [parsed](/compilers-and-programming-languages/parsing) and converted into the corresponding styled HTML elements.

1. **Tokenization**: The first thing to do is to break down the input Markdown text into individual tokens. Tokens are the elements of the Markdown syntax, such as headers, lists, paragraphs. This step is often done using regular expressions or other pattern matching techniques.
2. **Parsing**: Once the tokens are identified, the Markdown processor analyzes their structure and relationships to build a hierarchical representation of the document. It identifies the nesting of elements, such as nested lists, and creates a data structure (usually tree) to represent the document's structure.
3. **Conversion**: After the document structure is determined, the Markdown processor applies transformation rules to convert the Markdown tokens and structure into the desired output format. For example, it may convert headers to HTML heading tags, lists to HTML lists, or inline formatting to appropriate HTML tags or styles.
4. **Rendering**: The converted output is rendered or displayed according to the target medium. For web-based applications, the rendered output may be displayed directly in the browser.
1. **Tokenization**: Markdown text is broke down into tokens. Tokens are the elements of the Markdown syntax, such as headers, lists, paragraphs.
2. **Parsing**: Tokens are identified and analyzed to build a hierarchical representation of the document. It identifies the nesting of elements, such as nested lists, and creates a data structure (usually [tree](/data-structures-and-algorithms/tree)) to represent the document's structure.
3. **Conversion**: The Markdown processor applies transformation rules to convert the Markdown tokens and structure into the desired output format. For example, when it sees `#` behind a text, it may generate HTML heading tags (`<h1>`), and fill the corresponding text inside it.
4. **Rendering**: The converted HTML is rendered and displayed by [web browser](/internet-and-web/web-browser).

![The general parsing process](./parsing-process.png)
Source: https://accu.org/journals/overload/26/146/balaam_2532/
15 changes: 6 additions & 9 deletions docs/digital-media-processing/31-pdf/pdf.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,24 @@ title: PDF
description: PDF
---

**Main Source: Various source from Google**
**Main Source:**

**Portable Document Format (PDF)** is a file format developed by Adobe in 1992 used to present documents, including text formatting, images, other interactive media such as, annotation, form-fields, video, and etc. PDF is designed to be cross-platform, meaning they can be viewed and accessed on different operating systems and devices with the appropriate PDF reader installed.
- **[PDF — Wikipedia](https://en.wikipedia.org/wiki/PDF)**

The content of PDF itself is in binary format, it encode all the content using ASCII.
**PDF (Portable Document Format)** is a file format developed by Adobe in 1992 used to present documents, text formatting, images, and other interactive media such as, annotation, form-fields, video, etc. PDF is designed to be cross-platform, meaning they can be viewed and accessed on different operating systems and devices with the appropriate PDF reader installed.

PDF use variety of compression algorithm such as LZW (Lempel-Ziv-Welch), FLATE (ZIP, in PDF 1.2), JPEG. PDF also supports encryption with 256-bit [AES encryption](/computer-security/aes) in Cipher Block Chaining Encryption (CBC) mode for securing the content of the file.
The content of PDF itself is in binary format with the ASCII encoding for text contents. PDF uses a variety of compression algorithm such as LZW (Lempel-Ziv-Welch), FLATE (ZIP, in PDF 1.2), and JPEG. PDF also supports encryption with 256-bit [AES encryption](/computer-security/aes) for securing the content of the file.

![PDF icon and an example of PDF document](./pdf-example.png)
Source: https://en.wikipedia.org/wiki/PDF

### PDF Structure

PDF has 4 main component, the body is the actual content of the document.

1. **Header**: The header is the starting point of a PDF file and contains information about the version of the PDF specification. The header format looks like `%PDF-1.x` where x is the version.
PDF has 4 main components, where the body is the actual content of the document.

1. **Header**: The header is the starting point of a PDF file and contains information about the version of the PDF specification. The header format looks like `%PDF-1.x` where `x` is a version number.
2. **Body**: The content of PDF file is self-contained in a container called object. The object can be text, images, fonts, or other types of data, it also supports other nested object.

3. **Cross-Reference Table (xref)**: The cross-reference table contains a list of all objects in the file, their byte offsets, and their status (whether they are still in use or not). This table allows for efficient random access and updating of the file.

4. **Trailer**: The trailer section provides essential information about the PDF file, including the location of the cross-reference table, the total number of objects in the file, the root object of the document, and the end of file marker.

![Structure of PDF file](./pdf-structure.png)
Expand Down
8 changes: 3 additions & 5 deletions docs/digital-media-processing/32-swf/swf.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,15 @@ description: SWF
- **[Adobe Flash Player — Wikipedia](https://en.wikipedia.org/wiki/Adobe_Flash_Player)**
- **[SWF — Wikipedia](https://en.wikipedia.org/wiki/SWF)**

SWF is a file format for flash player.

**Flash player** is a multimedia software for playback of multimedia content, such as animations, videos, and interactive applications, on web browsers. Flash player use a variety of assets such as vector graphics, 3d graphics, video, audio, and raster image. In addition to assets, we can also add interactivity to the element using a scripting language called **ActionScript**.
SWF is a file format for flash player. **Flash player** is a multimedia software for the playback of multimedia content, such as animations, videos, and interactive applications, on web browsers. Flash player use a variety of assets such as [vector graphics](/computer-graphics/computer-images-part-1), 3d graphics, video, audio, and [raster image](/computer-graphics/computer-images-part-1). In addition to assets, we can also add interactivity to the element using a scripting language called **ActionScript**.

![Example of flash player animation](./flash-player-example.png)
Source: https://addons.mozilla.org/id/firefox/addon/flashplayer-swf-to-html/

### How flash player works
### How flash player work

Flash player content are stored in a file with `.swf` format. Developer creates flash player content, this includes all the assets used in the multimedia content as well as script that defines their behavior or interaction. All of these are compiled into binaries data, which is contained within the SWF file.

A browser must have an engine that can interpret SWF file and run its content. Some flash player engine can be "plugged-in" onto the browser, or built-in within the browser. The flash player itself and the SWF file structure is complex. In high-level, the binaries in the SWF file is supposed to be parsed, interpreted, then translated into corresponding code to render its content. Here, the flash player engine or plugin know how to do this.
A browser must have an engine that can interpret SWF file and run its content. Some flash player engine can be "plugged-in" onto the browser, or built-in within the browser. The flash player itself and the SWF file structure is complex. In high-level, the binaries in the SWF file is supposed to be [parsed](/compilers-and-programming-languages/parsing), interpreted, then translated into corresponding code to render its content. Here, the flash player engine or plugin know how to do this.

The rendering process may involve low-level instruction from the SWF, such as drawing triangle and coloring it, placing them in the correct position, loading certain images, or playing audio. They will be converted into high-level instruction governed by the browser API.
2 changes: 1 addition & 1 deletion docs/operating-system/11-file-system/file-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ Two common approaches are using a [linear list](/data-structures-and-algorithms/

- **Hash Table**: In this approach, a linear list is still used to store the directory entries, but a hash data structure is employed as well. The hash table takes a value computed from the file name and returns a pointer to the corresponding entry in the linear list. This allows for faster directory search by greatly reducing the search time.

When a file name needs to be looked up, it is hashed to generate a value within a given range. This value is then used to directly access the corresponding entry in the linear list, avoiding the need for sequential searching. However, [collisions](/data-structures-and-algorithms/hash-function#collision) may occur when two file names hash to the same location, requiring collision resolution techniques, such as using a linked list within each hash entry.
When a file name needs to be looked up, it is hashed to generate a value within a given range. This value is then used to directly access the corresponding entry in the linear list, avoiding the need for sequential searching. However, [collisions](/computer-security/hash-function#collision) may occur when two file names hash to the same location, requiring collision resolution techniques, such as using a linked list within each hash entry.

![Hash table directory](./hash-table-directory.png)
Source: https://www.javatpoint.com/os-directory-implementation
Expand Down
2 changes: 1 addition & 1 deletion sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,8 @@ const sidebars = {
items: [
"digital-media-processing/xml/xml",
"digital-media-processing/json/json",
"digital-media-processing/md/md",
"digital-media-processing/txt/txt",
"digital-media-processing/md/md",
"digital-media-processing/pdf/pdf",
],
},
Expand Down

0 comments on commit 02ddccc

Please sign in to comment.