diff --git a/natural-language-processing/reading/information-retrieval.md b/natural-language-processing/reading/information-retrieval.md new file mode 100644 index 0000000..161e205 --- /dev/null +++ b/natural-language-processing/reading/information-retrieval.md @@ -0,0 +1,157 @@ +# Information Retrieval + +IR in general is the process of obtaining information based on user queries, and can be applied to pretty much any form of media. Probably the most prevalent form of IR that we use every day is through **search engines**. + +## Ad Hoc Retrieval + +A user poses a **query** to a retrieval system, which then returns an ordered set of **documents** from some **collection**. A **document** refers to whatever unit of text the system indexes and retrieves (e.g. a webpage, a book, a tweet, etc.). The **collection** is the set of all documents that the system has indexed. A **term** can correspond to either a word, phrase, or some other unit of text which documents are indexed by. A query is therefore a set of terms. + +A simple architecture for an IR system is as follows: + +- Document collection in persistent storage +- Indexing/Preprocessing module to convert documents into an inverted index +- Query processing module to process user queries into query vectors +- Search module to take in query vectors, which then searches the inverted index, returning a set of ranked documents + +```txt +persistent storage + +-----------+++ + | Documents ||| ----> Indexing/Preprocessing ----> Inverted Index + +-----------+++ | + | + v + + User Query ---> Query Processing ---(query vector)--> Search + ^ | + | | + +---------------(ranked docs)------------------------+ +``` + +Usually, we'll want to also persist the inverted index to disk, so that we don't have to recompute it every time we want to search, but online queries will at least usually be served by using an in-memory index. + +We can map queries and documents both to vectors based on unigram word counts, and then use cosine similarity between vectors to rank documents. This is an example of the **bag-of-words** model, since words are considered independently of their positions. + +### Term weighting (tf-idf) + +Using raw word counts isn't very effective. We instead compute a **term weight** for each document word (e.g. **tf-idf** or **BM25**). For tf-idf (term frequency-inverse document frequency), we compute the term frequency (tf) and inverse document frequency (idf) for each term in each document. The tf is the number of times a term appears in a document, and the idf is the log of the total number of documents divided by the number of documents containing the term. The tf-idf score is then the product of these two values. + +$$ +\text{tf}_{t, d} = \begin{cases} + 1 + \log_{10} \text{count}(t, d) & \text{if count}(t, d) > 0 \\ + 0 & \text{otherwise} +\end{cases} +$$ + +For intuition behind using $log$, if $w_1$ appears $100$ times in a document, and $w_2$ only once, it doesn't mean that $w_1$ is $100$ times more important. Note that alternative definitions of tf exist, e.g. $\log_{10}(1 + \text{count}(t, d))$. + +On the other hand, the **document frequency** is the number of documents containing a term. The idf is then defined as: + +$$ +\text{idf}_t = \log_{10} \left( \frac{N}{\text{df}_t} \right) +$$ + +where $N$ is the total number of documents in the collection. Therefore, for a word that is contained in **every** document, we'd have an $idf$ of 0. The tf-idf score is then: + +$$ +\text{tf-idf}_{t, d} = \text{tf}_{t, d} \times \text{idf}_t = \begin{cases} + (1 + \log_{10} \text{count}(t, d)) \times \log_{10} \left( \frac{N}{\text{df}_t} \right) & \text{if count}(t, d) > 0 \\ + 0 & \text{otherwise} +\end{cases} +$$ + +### Document scoring + +We can then score a document $d$ by the cosine of its vector $v_d$ with the query vector $v_q$: + +$$ +\text{score}(q, d) = cos(v_q, v_d) = \frac{v_q \cdot v_d}{\|v_q\| \|v_d\|} +$$ + +Alternatively, you can think of the cosine as the dot product of the document and query unit vectors, e.g.: + +$$ +\text{score}(q, d) = cos(v_q, v_d) = \frac{v_t}{\|v_q\|} \cdot \frac{v_d}{\|v_d\|} +$$ + +Then, plugging in the tf-idf scores: + +$$ +\text{score}(q, d) = \sum_{t \in q} \frac{\text{tf-idf}_{t, q}}{\sqrt{\sum_{q_i \in q} \text{tf-idf}^2(q_i, q)}} \times \frac{\text{tf-idf}_{t, d}}{\sqrt{\sum_{d_i \in d} \text{tf-idf}^2(d_i, d)}} +$$ + +Many variations exist, particularly ones that drop terms in order to reduce computation required. A notable variant is **BM25**, which introduces parameters $k$ to adjust balance between $tf$ and $idf$, and $b$ which controls the importance of document length normalization. + +$$ +\text{score}(q, d) = \sum_{t \in q} \log \left( \frac{N}{\text{df}_t} \right) \cdot \frac{tf_{t, d}}{k(1 - b + b \cdot \frac{|d|}{|d_{avg}|}) + tf_{t, d}} +$$ + +Where $d_{avg}$ is the average document length in the collection. When $k = 0$, BM25 reverts to no use of term frequency, just like a binary selection of terms in the query (plus idf). A large $k$ results in raw term frequency (plus idf). $b$ ranges from $1$ (scaling by document length) to $0$ (no scaling). Reasonable defaults for these parameters are $k = [1.2, 2.0]$ and $b = 0.75$. + +#### Quick aside: stop words + +Stop words are common words that would traditionally be removed from the text before indexing, since they don't add much information. However, tf-idf already does a good job of downweighting common words, so stop words are less important in modern systems, an are often included in the index to make search for phrases easier. + +### Inverted Index + +Using an inverted index, want to be able to find all documents $d \in C$ that contain a term $q \in Q$. The index is composed of two parts: a **dictionary** and a **postings list**. The dictionary is a collection of terms (designed to be efficiently accessed) which map to a postings list for the term. A posting list is the list of document IDs associated with each term, which can also contain additional metadata (e.g. term frequency, positions, etc.). + +This gives us an efficient access pattern for computing tf-idf scores for documents, since we can look up the postings list for each term in the query. However, alternatives, especially for question answering, exist (e.g. [Chen et al. 2017](https://aclanthology.org/P17-1171/)). + +### Evaluation + +Use **precision**, the fraction of returned docs that are relevant, and **recall**, the fraction of all relevant docs that are returned. + +Assume that each document in our IR system is either relevant or not relevant to a query. Further, let $U$ be the set of all relevant documents, $T$ be the set of ranked documents returned, and $R$ be the set of relevant documents in $T$. Then, we can define precision and recall as: + +$$ +\text{precision} = \frac{|R|}{|T|} \quad \text{recall} = \frac{|R|}{|U|} +$$ + +Note that recall always increases, e.g. it isn't penalized by returning an irrelevant document. Precision, on the other hand, can decrease if we return irrelevant documents. It is useful to plot precision-recall curves, which show the tradeoff between precision and recall as we vary the number of documents returned. + +$$ +\text{InterpolatedPrecision} = \text{maxPrecision}(i) \text{ for } i \ge r +$$ + +```python +def interpolate_PR_curve(precision, recall): + """ + plot averaged precision values at 11 fixed levels of recall (0 to 100 by 10) + """ + recall_levels = np.linspace(0, 1, 11) + interpolated_precision = np.zeros_like(recall_levels) + for i, r in enumerate(recall_levels): + interpolated_precision[i] = np.max(precision[recall >= r]) + return interpolated_precision, recall_levels +``` + +#### Mean Average Precision (MAP) + +Assume $R_r$ is the set of relevant documents at or above $r$ in the ranked list. Then, the average precision at $r$ is: + +$$ +\text{AP} = \frac{1}{|R_r|} \sum_{d \in R_r} \text{Precision}_{r}(d) +$$ + +Where $\text{Precision}_{r}(d)$ is the precision measured at the rank $r$ where document $d$ was retrieved. For an ensemble of queries $Q$, we average the AP over all queries to get the MAP: + +$$ +\text{MAP} = \frac{1}{|Q|} \sum_{q \in Q} \text{AP}(q) +$$ + +## IR with Dense Vectors + +tf-idf and BM25 both kind of suck in a way (read vocabulary mismatch problem). Instead, we need to handle synonyms by using dense vectors (as opposed to sparse ones like word counts). This is implemented today via encoders like BERT. + +The general approach is to present both the query and the document to a single encoder, allowing the transformer self-attention to see all tokens of both the query and the document, thus also building a representation that is sensitive to the meanings in both. Then, a linear layer can be put on top of the [CLS] token to predict the similarity score for the query and document. + +$$ +z = BERT(q;[SEP];d)[CLS] +$$ + +$$ +\text{score}(q, d) = \text{softmax}(U(z)) +$$ + +Note: BERT was trained using `[CLS] sen A [SEP] sen B [SEP]`. `[SEP]` is used to help the model distinguish between the two sentences. `[CLS]` is used to represent the entire sentence. + diff --git a/site/categories/algorithm analysis.html b/site/categories/algorithm analysis.html index ab5dbd3..40c47e4 100644 --- a/site/categories/algorithm analysis.html +++ b/site/categories/algorithm analysis.html @@ -179,7 +179,7 @@

Category: Algorithm Analysis

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/algorithms.html b/site/categories/algorithms.html index 8634a18..bbb7cf3 100644 --- a/site/categories/algorithms.html +++ b/site/categories/algorithms.html @@ -179,7 +179,7 @@

Category: algorithms

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/computer science.html b/site/categories/computer science.html index 3e0eeac..bc43d4d 100644 --- a/site/categories/computer science.html +++ b/site/categories/computer science.html @@ -179,7 +179,7 @@

Category: Computer Science

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/database design.html b/site/categories/database design.html index 351722e..991d3e6 100644 --- a/site/categories/database design.html +++ b/site/categories/database design.html @@ -179,7 +179,7 @@

Category: Database Design

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/database systems.html b/site/categories/database systems.html index 6b3637f..b1f0115 100644 --- a/site/categories/database systems.html +++ b/site/categories/database systems.html @@ -179,7 +179,7 @@

Category: Database Systems

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/distributed systems.html b/site/categories/distributed systems.html index 8617a49..23e10b9 100644 --- a/site/categories/distributed systems.html +++ b/site/categories/distributed systems.html @@ -179,7 +179,7 @@

Category: Distributed Systems

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/graph theory.html b/site/categories/graph theory.html index 6a50c59..c742f64 100644 --- a/site/categories/graph theory.html +++ b/site/categories/graph theory.html @@ -179,7 +179,7 @@

Category: Graph Theory

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/index.html b/site/categories/index.html index 9f8201e..c9f7e62 100644 --- a/site/categories/index.html +++ b/site/categories/index.html @@ -178,7 +178,7 @@

Categories

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/mathematics.html b/site/categories/mathematics.html index 7927156..8db02f6 100644 --- a/site/categories/mathematics.html +++ b/site/categories/mathematics.html @@ -179,7 +179,7 @@

Category: Mathematics

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/operations research.html b/site/categories/operations research.html index 9744301..7417498 100644 --- a/site/categories/operations research.html +++ b/site/categories/operations research.html @@ -179,7 +179,7 @@

Category: Operations Research

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/categories/software engineering.html b/site/categories/software engineering.html index b6dc9ea..7ed47a1 100644 --- a/site/categories/software engineering.html +++ b/site/categories/software engineering.html @@ -179,7 +179,7 @@

Category: Software Engineering

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/index.html b/site/index.html index 237a285..0d796a6 100644 --- a/site/index.html +++ b/site/index.html @@ -178,14 +178,14 @@

My Notes

- Last modified: 2025-01-07 + Last modified: 2025-01-08
- 151 + 152 Notes
@@ -201,6 +201,11 @@

My Notes

Recent Notes

diff --git a/site/natural-language-processing/reading/information-retrieval.html b/site/natural-language-processing/reading/information-retrieval.html new file mode 100644 index 0000000..f175c36 --- /dev/null +++ b/site/natural-language-processing/reading/information-retrieval.html @@ -0,0 +1,297 @@ + + + + + + Information Retrieval + + + + + +
+ +

Information Retrieval

+
+ Last modified: 2025-01-07 + +
+
+

Information Retrieval

+

IR in general is the process of obtaining information based on user queries, and can be applied to pretty much any form of media. Probably the most prevalent form of IR that we use every day is through search engines.

+

Ad Hoc Retrieval

+

A user poses a query to a retrieval system, which then returns an ordered set of documents from some collection. A document refers to whatever unit of text the system indexes and retrieves (e.g. a webpage, a book, a tweet, etc.). The collection is the set of all documents that the system has indexed. A term can correspond to either a word, phrase, or some other unit of text which documents are indexed by. A query is therefore a set of terms.

+

A simple architecture for an IR system is as follows:

+
    +
  • Document collection in persistent storage
  • +
  • Indexing/Preprocessing module to convert documents into an inverted index
  • +
  • Query processing module to process user queries into query vectors
  • +
  • Search module to take in query vectors, which then searches the inverted index, returning a set of ranked documents
  • +
+
persistent storage
+ +-----------+++
+ | Documents ||| ----> Indexing/Preprocessing ----> Inverted Index
+ +-----------+++                                         |
+                                                         |
+                                                         v
+
+ User Query ---> Query Processing ---(query vector)--> Search
+    ^                                                    |
+    |                                                    |
+    +---------------(ranked docs)------------------------+
+
+

Usually, we'll want to also persist the inverted index to disk, so that we don't have to recompute it every time we want to search, but online queries will at least usually be served by using an in-memory index.

+

We can map queries and documents both to vectors based on unigram word counts, and then use cosine similarity between vectors to rank documents. This is an example of the bag-of-words model, since words are considered independently of their positions.

+

Term weighting (tf-idf)

+

Using raw word counts isn't very effective. We instead compute a term weight for each document word (e.g. tf-idf or BM25). For tf-idf (term frequency-inverse document frequency), we compute the term frequency (tf) and inverse document frequency (idf) for each term in each document. The tf is the number of times a term appears in a document, and the idf is the log of the total number of documents divided by the number of documents containing the term. The tf-idf score is then the product of these two values.

+

$$ +\text{tf}{t, d} = \begin{cases} + 1 + \log(t, d) > 0 \ + 0 & \text{otherwise} +\end{cases} +$$} \text{count}(t, d) & \text{if count

+

For intuition behind using $log$, if $w_1$ appears $100$ times in a document, and $w_2$ only once, it doesn't mean that $w_1$ is $100$ times more important. Note that alternative definitions of tf exist, e.g. $\log_{10}(1 + \text{count}(t, d))$.

+

On the other hand, the document frequency is the number of documents containing a term. The idf is then defined as:

+

$$ +\text{idf}t = \log \right) +$$} \left( \frac{N}{\text{df}_t

+

where $N$ is the total number of documents in the collection. Therefore, for a word that is contained in every document, we'd have an $idf$ of 0. The tf-idf score is then:

+

$$ +\text{tf-idf}{t, d} = \text{tf}} \times \text{idft = \begin{cases} + (1 + \log(t, d) > 0 \ + 0 & \text{otherwise} +\end{cases} +$$} \text{count}(t, d)) \times \log_{10} \left( \frac{N}{\text{df}_t} \right) & \text{if count

+

Document scoring

+

We can then score a document $d$ by the cosine of its vector $v_d$ with the query vector $v_q$:

+

$$ +\text{score}(q, d) = cos(v_q, v_d) = \frac{v_q \cdot v_d}{|v_q| |v_d|} +$$

+

Alternatively, you can think of the cosine as the dot product of the document and query unit vectors, e.g.:

+

$$ +\text{score}(q, d) = cos(v_q, v_d) = \frac{v_t}{|v_q|} \cdot \frac{v_d}{|v_d|} +$$

+

Then, plugging in the tf-idf scores:

+

$$ +\text{score}(q, d) = \sum_{t \in q} \frac{\text{tf-idf}{t, q}}{\sqrt{\sum} \text{tf-idf}^2(q_i, q)}} \times \frac{\text{tf-idf{t, d}}{\sqrt{\sum +$$} \text{tf-idf}^2(d_i, d)}

+

Many variations exist, particularly ones that drop terms in order to reduce computation required. A notable variant is BM25, which introduces parameters $k$ to adjust balance between $tf$ and $idf$, and $b$ which controls the importance of document length normalization.

+

$$ +\text{score}(q, d) = \sum_{t \in q} \log \left( \frac{N}{\text{df}t} \right) \cdot \frac{tf +$$}}{k(1 - b + b \cdot \frac{|d|}{|d_{avg}|}) + tf_{t, d}

+

Where $d_{avg}$ is the average document length in the collection. When $k = 0$, BM25 reverts to no use of term frequency, just like a binary selection of terms in the query (plus idf). A large $k$ results in raw term frequency (plus idf). $b$ ranges from $1$ (scaling by document length) to $0$ (no scaling). Reasonable defaults for these parameters are $k = [1.2, 2.0]$ and $b = 0.75$.

+

Quick aside: stop words

+

Stop words are common words that would traditionally be removed from the text before indexing, since they don't add much information. However, tf-idf already does a good job of downweighting common words, so stop words are less important in modern systems, an are often included in the index to make search for phrases easier.

+

Inverted Index

+

Using an inverted index, want to be able to find all documents $d \in C$ that contain a term $q \in Q$. The index is composed of two parts: a dictionary and a postings list. The dictionary is a collection of terms (designed to be efficiently accessed) which map to a postings list for the term. A posting list is the list of document IDs associated with each term, which can also contain additional metadata (e.g. term frequency, positions, etc.).

+

This gives us an efficient access pattern for computing tf-idf scores for documents, since we can look up the postings list for each term in the query. However, alternatives, especially for question answering, exist (e.g. Chen et al. 2017).

+

Evaluation

+

Use precision, the fraction of returned docs that are relevant, and recall, the fraction of all relevant docs that are returned.

+

Assume that each document in our IR system is either relevant or not relevant to a query. Further, let $U$ be the set of all relevant documents, $T$ be the set of ranked documents returned, and $R$ be the set of relevant documents in $T$. Then, we can define precision and recall as:

+

$$ +\text{precision} = \frac{|R|}{|T|} \quad \text{recall} = \frac{|R|}{|U|} +$$

+

Note that recall always increases, e.g. it isn't penalized by returning an irrelevant document. Precision, on the other hand, can decrease if we return irrelevant documents. It is useful to plot precision-recall curves, which show the tradeoff between precision and recall as we vary the number of documents returned.

+

$$ +\text{InterpolatedPrecision} = \text{maxPrecision}(i) \text{ for } i \ge r +$$

+
def interpolate_PR_curve(precision, recall):
+    """
+    plot averaged precision values at 11 fixed levels of recall (0 to 100 by 10)
+    """
+    recall_levels = np.linspace(0, 1, 11)
+    interpolated_precision = np.zeros_like(recall_levels)
+    for i, r in enumerate(recall_levels):
+        interpolated_precision[i] = np.max(precision[recall >= r])
+    return interpolated_precision, recall_levels
+
+

Mean Average Precision (MAP)

+

Assume $R_r$ is the set of relevant documents at or above $r$ in the ranked list. Then, the average precision at $r$ is:

+

$$ +\text{AP} = \frac{1}{|R_r|} \sum_{d \in R_r} \text{Precision}_{r}(d) +$$

+

Where $\text{Precision}_{r}(d)$ is the precision measured at the rank $r$ where document $d$ was retrieved. For an ensemble of queries $Q$, we average the AP over all queries to get the MAP:

+

$$ +\text{MAP} = \frac{1}{|Q|} \sum_{q \in Q} \text{AP}(q) +$$

+

IR with Dense Vectors

+

tf-idf and BM25 both kind of suck in a way (read vocabulary mismatch problem). Instead, we need to handle synonyms by using dense vectors (as opposed to sparse ones like word counts). This is implemented today via encoders like BERT.

+

The general approach is to present both the query and the document to a single encoder, allowing the transformer self-attention to see all tokens of both the query and the document, thus also building a representation that is sensitive to the meanings in both. Then, a linear layer can be put on top of the [CLS] token to predict the similarity score for the query and document.

+

$$ +z = BERT(q;[SEP];d)[CLS] +$$

+

$$ +\text{score}(q, d) = \text{softmax}(U(z)) +$$

+

Note: BERT was trained using [CLS] sen A [SEP] sen B [SEP]. [SEP] is used to help the model distinguish between the two sentences. [CLS] is used to represent the entire sentence.

+
+ +
+ + \ No newline at end of file diff --git a/site/tags/acyclic graphs.html b/site/tags/acyclic graphs.html index 2c1168b..c17e3ed 100644 --- a/site/tags/acyclic graphs.html +++ b/site/tags/acyclic graphs.html @@ -179,7 +179,7 @@

Tag: acyclic graphs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/algorithm-analysis.html b/site/tags/algorithm-analysis.html index d813efc..a784ff4 100644 --- a/site/tags/algorithm-analysis.html +++ b/site/tags/algorithm-analysis.html @@ -179,7 +179,7 @@

Tag: algorithm-analysis

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/algorithm.html b/site/tags/algorithm.html index 8f3b21a..f0d26d7 100644 --- a/site/tags/algorithm.html +++ b/site/tags/algorithm.html @@ -179,7 +179,7 @@

Tag: algorithm

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/algorithms.html b/site/tags/algorithms.html index c7b2b0f..8280220 100644 --- a/site/tags/algorithms.html +++ b/site/tags/algorithms.html @@ -179,7 +179,7 @@

Tag: algorithms

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/approximation.html b/site/tags/approximation.html index e9427d4..e602350 100644 --- a/site/tags/approximation.html +++ b/site/tags/approximation.html @@ -179,7 +179,7 @@

Tag: approximation

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/asymptotic notation.html b/site/tags/asymptotic notation.html index 5e9b88f..971e5c5 100644 --- a/site/tags/asymptotic notation.html +++ b/site/tags/asymptotic notation.html @@ -179,7 +179,7 @@

Tag: asymptotic notation

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/batch processing.html b/site/tags/batch processing.html index f1ace26..efa5ca7 100644 --- a/site/tags/batch processing.html +++ b/site/tags/batch processing.html @@ -179,7 +179,7 @@

Tag: batch processing

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/bipartite graphs.html b/site/tags/bipartite graphs.html index 1d1349d..3814803 100644 --- a/site/tags/bipartite graphs.html +++ b/site/tags/bipartite graphs.html @@ -179,7 +179,7 @@

Tag: bipartite graphs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/bipartite matching.html b/site/tags/bipartite matching.html index b032f7e..89cea9d 100644 --- a/site/tags/bipartite matching.html +++ b/site/tags/bipartite matching.html @@ -179,7 +179,7 @@

Tag: bipartite matching

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/breadth-first search.html b/site/tags/breadth-first search.html index 5958418..6fb7008 100644 --- a/site/tags/breadth-first search.html +++ b/site/tags/breadth-first search.html @@ -179,7 +179,7 @@

Tag: breadth-first search

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/column-oriented storage.html b/site/tags/column-oriented storage.html index 6bd6b3c..501c107 100644 --- a/site/tags/column-oriented storage.html +++ b/site/tags/column-oriented storage.html @@ -179,7 +179,7 @@

Tag: column-oriented storage

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/compatibility.html b/site/tags/compatibility.html index 2e38e5e..29967b8 100644 --- a/site/tags/compatibility.html +++ b/site/tags/compatibility.html @@ -179,7 +179,7 @@

Tag: compatibility

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/complexity analysis.html b/site/tags/complexity analysis.html index 5cc5fa1..254717d 100644 --- a/site/tags/complexity analysis.html +++ b/site/tags/complexity analysis.html @@ -179,7 +179,7 @@

Tag: complexity analysis

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/complexity-analysis.html b/site/tags/complexity-analysis.html index 749f946..13eb988 100644 --- a/site/tags/complexity-analysis.html +++ b/site/tags/complexity-analysis.html @@ -179,7 +179,7 @@

Tag: complexity-analysis

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/connected components.html b/site/tags/connected components.html index 39fbd7d..4162493 100644 --- a/site/tags/connected components.html +++ b/site/tags/connected components.html @@ -179,7 +179,7 @@

Tag: connected components

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/connected graphs.html b/site/tags/connected graphs.html index 4bd1afc..fe6331e 100644 --- a/site/tags/connected graphs.html +++ b/site/tags/connected graphs.html @@ -179,7 +179,7 @@

Tag: connected graphs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data analysis.html b/site/tags/data analysis.html index bb5690c..4fb2f9e 100644 --- a/site/tags/data analysis.html +++ b/site/tags/data analysis.html @@ -179,7 +179,7 @@

Tag: data analysis

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data modeling.html b/site/tags/data modeling.html index 4102ce6..32bf049 100644 --- a/site/tags/data modeling.html +++ b/site/tags/data modeling.html @@ -179,7 +179,7 @@

Tag: data modeling

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data replication.html b/site/tags/data replication.html index eb5870f..fdaa895 100644 --- a/site/tags/data replication.html +++ b/site/tags/data replication.html @@ -179,7 +179,7 @@

Tag: data replication

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data serialization.html b/site/tags/data serialization.html index 5120102..eb8f93e 100644 --- a/site/tags/data serialization.html +++ b/site/tags/data serialization.html @@ -179,7 +179,7 @@

Tag: data serialization

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data structures.html b/site/tags/data structures.html index bb5df5c..a13c0f3 100644 --- a/site/tags/data structures.html +++ b/site/tags/data structures.html @@ -179,7 +179,7 @@

Tag: data structures

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/data systems.html b/site/tags/data systems.html index 5e62047..b07fcaf 100644 --- a/site/tags/data systems.html +++ b/site/tags/data systems.html @@ -179,7 +179,7 @@

Tag: data systems

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/depth first search.html b/site/tags/depth first search.html index f1cb8bb..df8bd9c 100644 --- a/site/tags/depth first search.html +++ b/site/tags/depth first search.html @@ -179,7 +179,7 @@

Tag: depth first search

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/depth-first search.html b/site/tags/depth-first search.html index 97f736d..012866c 100644 --- a/site/tags/depth-first search.html +++ b/site/tags/depth-first search.html @@ -179,7 +179,7 @@

Tag: depth-first search

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/distributed filesystems.html b/site/tags/distributed filesystems.html index 3644634..0dda1fd 100644 --- a/site/tags/distributed filesystems.html +++ b/site/tags/distributed filesystems.html @@ -179,7 +179,7 @@

Tag: distributed filesystems

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/document databases.html b/site/tags/document databases.html index 618ecc0..dc307b9 100644 --- a/site/tags/document databases.html +++ b/site/tags/document databases.html @@ -179,7 +179,7 @@

Tag: document databases

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/dynamic-programming.html b/site/tags/dynamic-programming.html index 6a70a48..521f8f7 100644 --- a/site/tags/dynamic-programming.html +++ b/site/tags/dynamic-programming.html @@ -179,7 +179,7 @@

Tag: dynamic-programming

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/efficiency.html b/site/tags/efficiency.html index 68ff4e0..c91f40b 100644 --- a/site/tags/efficiency.html +++ b/site/tags/efficiency.html @@ -179,7 +179,7 @@

Tag: efficiency

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/encoding formats.html b/site/tags/encoding formats.html index 6be9b7b..ee5fd36 100644 --- a/site/tags/encoding formats.html +++ b/site/tags/encoding formats.html @@ -179,7 +179,7 @@

Tag: encoding formats

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/etl.html b/site/tags/etl.html index 7503968..ab9a663 100644 --- a/site/tags/etl.html +++ b/site/tags/etl.html @@ -179,7 +179,7 @@

Tag: etl

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/failover.html b/site/tags/failover.html index 4e81a9c..9c92759 100644 --- a/site/tags/failover.html +++ b/site/tags/failover.html @@ -179,7 +179,7 @@

Tag: failover

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/ford-fulkerson algorithm.html b/site/tags/ford-fulkerson algorithm.html index e70e4f1..adff8b3 100644 --- a/site/tags/ford-fulkerson algorithm.html +++ b/site/tags/ford-fulkerson algorithm.html @@ -179,7 +179,7 @@

Tag: ford-fulkerson algorithm

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/gale-shapley.html b/site/tags/gale-shapley.html index fd4d556..136a60f 100644 --- a/site/tags/gale-shapley.html +++ b/site/tags/gale-shapley.html @@ -179,7 +179,7 @@

Tag: gale-shapley

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph coloring.html b/site/tags/graph coloring.html index b7d15cb..35e41e7 100644 --- a/site/tags/graph coloring.html +++ b/site/tags/graph coloring.html @@ -179,7 +179,7 @@

Tag: graph coloring

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph databases.html b/site/tags/graph databases.html index fdb0548..3bba10b 100644 --- a/site/tags/graph databases.html +++ b/site/tags/graph databases.html @@ -179,7 +179,7 @@

Tag: graph databases

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph fundamentals.html b/site/tags/graph fundamentals.html index d7047f1..f74ece7 100644 --- a/site/tags/graph fundamentals.html +++ b/site/tags/graph fundamentals.html @@ -179,7 +179,7 @@

Tag: graph fundamentals

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph properties.html b/site/tags/graph properties.html index a4ba625..71b85ea 100644 --- a/site/tags/graph properties.html +++ b/site/tags/graph properties.html @@ -179,7 +179,7 @@

Tag: graph properties

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph representation.html b/site/tags/graph representation.html index e178156..0ac5a23 100644 --- a/site/tags/graph representation.html +++ b/site/tags/graph representation.html @@ -179,7 +179,7 @@

Tag: graph representation

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph theory.html b/site/tags/graph theory.html index fad28f4..28b8059 100644 --- a/site/tags/graph theory.html +++ b/site/tags/graph theory.html @@ -179,7 +179,7 @@

Tag: graph theory

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph traversal.html b/site/tags/graph traversal.html index a52be2f..3928efa 100644 --- a/site/tags/graph traversal.html +++ b/site/tags/graph traversal.html @@ -179,7 +179,7 @@

Tag: graph traversal

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph-theory.html b/site/tags/graph-theory.html index 2d63438..94a3627 100644 --- a/site/tags/graph-theory.html +++ b/site/tags/graph-theory.html @@ -179,7 +179,7 @@

Tag: graph-theory

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph-traversal.html b/site/tags/graph-traversal.html index 42b643e..d68e5dc 100644 --- a/site/tags/graph-traversal.html +++ b/site/tags/graph-traversal.html @@ -179,7 +179,7 @@

Tag: graph-traversal

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/graph.html b/site/tags/graph.html index fcdf453..028a2cd 100644 --- a/site/tags/graph.html +++ b/site/tags/graph.html @@ -179,7 +179,7 @@

Tag: graph

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/greedy-algorithms.html b/site/tags/greedy-algorithms.html index bc3296e..759e947 100644 --- a/site/tags/greedy-algorithms.html +++ b/site/tags/greedy-algorithms.html @@ -179,7 +179,7 @@

Tag: greedy-algorithms

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/independent set.html b/site/tags/independent set.html index 23d34d3..8a1b5c1 100644 --- a/site/tags/independent set.html +++ b/site/tags/independent set.html @@ -179,7 +179,7 @@

Tag: independent set

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/index.html b/site/tags/index.html index 5386899..b62054a 100644 --- a/site/tags/index.html +++ b/site/tags/index.html @@ -178,7 +178,7 @@

Tags

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/indexing.html b/site/tags/indexing.html index c8e6d8b..69e76a7 100644 --- a/site/tags/indexing.html +++ b/site/tags/indexing.html @@ -179,7 +179,7 @@

Tag: indexing

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/induction proofs.html b/site/tags/induction proofs.html index cc41889..64e8b22 100644 --- a/site/tags/induction proofs.html +++ b/site/tags/induction proofs.html @@ -179,7 +179,7 @@

Tag: induction proofs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/induction.html b/site/tags/induction.html index b8e3d01..2805430 100644 --- a/site/tags/induction.html +++ b/site/tags/induction.html @@ -179,7 +179,7 @@

Tag: induction

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/interval.html b/site/tags/interval.html index 880a621..b30d148 100644 --- a/site/tags/interval.html +++ b/site/tags/interval.html @@ -179,7 +179,7 @@

Tag: interval

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/leader-follower model.html b/site/tags/leader-follower model.html index 52f5a92..64d980a 100644 --- a/site/tags/leader-follower model.html +++ b/site/tags/leader-follower model.html @@ -179,7 +179,7 @@

Tag: leader-follower model

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/linear programs.html b/site/tags/linear programs.html index 1c134f2..760fd54 100644 --- a/site/tags/linear programs.html +++ b/site/tags/linear programs.html @@ -179,7 +179,7 @@

Tag: linear programs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/linear systems.html b/site/tags/linear systems.html index 9d0c539..3f40a11 100644 --- a/site/tags/linear systems.html +++ b/site/tags/linear systems.html @@ -179,7 +179,7 @@

Tag: linear systems

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/maintainability.html b/site/tags/maintainability.html index 497762d..487949d 100644 --- a/site/tags/maintainability.html +++ b/site/tags/maintainability.html @@ -179,7 +179,7 @@

Tag: maintainability

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/mapreduce.html b/site/tags/mapreduce.html index c859336..b5361d1 100644 --- a/site/tags/mapreduce.html +++ b/site/tags/mapreduce.html @@ -179,7 +179,7 @@

Tag: mapreduce

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/matching.html b/site/tags/matching.html index 486055d..78d24de 100644 --- a/site/tags/matching.html +++ b/site/tags/matching.html @@ -179,7 +179,7 @@

Tag: matching

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/max flow min cut.html b/site/tags/max flow min cut.html index 7b53d9a..955869c 100644 --- a/site/tags/max flow min cut.html +++ b/site/tags/max flow min cut.html @@ -179,7 +179,7 @@

Tag: max flow min cut

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/message passing.html b/site/tags/message passing.html index 522335a..886ca87 100644 --- a/site/tags/message passing.html +++ b/site/tags/message passing.html @@ -179,7 +179,7 @@

Tag: message passing

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/odd cycles.html b/site/tags/odd cycles.html index dc9df4d..b1d58b2 100644 --- a/site/tags/odd cycles.html +++ b/site/tags/odd cycles.html @@ -179,7 +179,7 @@

Tag: odd cycles

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/oltp vs olap.html b/site/tags/oltp vs olap.html index ca78dda..a7642cb 100644 --- a/site/tags/oltp vs olap.html +++ b/site/tags/oltp vs olap.html @@ -179,7 +179,7 @@

Tag: oltp vs olap

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/optimization.html b/site/tags/optimization.html index dab3efb..8bdacc7 100644 --- a/site/tags/optimization.html +++ b/site/tags/optimization.html @@ -179,7 +179,7 @@

Tag: optimization

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/partitioning.html b/site/tags/partitioning.html index 9227055..c945769 100644 --- a/site/tags/partitioning.html +++ b/site/tags/partitioning.html @@ -179,7 +179,7 @@

Tag: partitioning

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/performance.html b/site/tags/performance.html index 2f66833..09e7337 100644 --- a/site/tags/performance.html +++ b/site/tags/performance.html @@ -179,7 +179,7 @@

Tag: performance

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/pigeonhole principle.html b/site/tags/pigeonhole principle.html index abc6b7d..aaf4b05 100644 --- a/site/tags/pigeonhole principle.html +++ b/site/tags/pigeonhole principle.html @@ -179,7 +179,7 @@

Tag: pigeonhole principle

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/problem-solving.html b/site/tags/problem-solving.html index 1f147c4..2a3152f 100644 --- a/site/tags/problem-solving.html +++ b/site/tags/problem-solving.html @@ -179,7 +179,7 @@

Tag: problem-solving

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/proof techniques.html b/site/tags/proof techniques.html index 2d6382b..a1d1f54 100644 --- a/site/tags/proof techniques.html +++ b/site/tags/proof techniques.html @@ -179,7 +179,7 @@

Tag: proof techniques

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/query languages.html b/site/tags/query languages.html index 880696d..77a06ae 100644 --- a/site/tags/query languages.html +++ b/site/tags/query languages.html @@ -179,7 +179,7 @@

Tag: query languages

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/relational databases.html b/site/tags/relational databases.html index f8cd967..bce4981 100644 --- a/site/tags/relational databases.html +++ b/site/tags/relational databases.html @@ -179,7 +179,7 @@

Tag: relational databases

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/reliability.html b/site/tags/reliability.html index 7599931..7757654 100644 --- a/site/tags/reliability.html +++ b/site/tags/reliability.html @@ -179,7 +179,7 @@

Tag: reliability

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/replication logs.html b/site/tags/replication logs.html index 94ae07d..4bee5a8 100644 --- a/site/tags/replication logs.html +++ b/site/tags/replication logs.html @@ -179,7 +179,7 @@

Tag: replication logs

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/scalability.html b/site/tags/scalability.html index ade12cc..41decc7 100644 --- a/site/tags/scalability.html +++ b/site/tags/scalability.html @@ -179,7 +179,7 @@

Tag: scalability

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/scheduling.html b/site/tags/scheduling.html index 8dcb970..033ec2e 100644 --- a/site/tags/scheduling.html +++ b/site/tags/scheduling.html @@ -179,7 +179,7 @@

Tag: scheduling

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/schema evolution.html b/site/tags/schema evolution.html index 44b9a96..6d19580 100644 --- a/site/tags/schema evolution.html +++ b/site/tags/schema evolution.html @@ -179,7 +179,7 @@

Tag: schema evolution

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/set cover.html b/site/tags/set cover.html index 15245fe..a9da9f4 100644 --- a/site/tags/set cover.html +++ b/site/tags/set cover.html @@ -179,7 +179,7 @@

Tag: set cover

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/shortest-paths.html b/site/tags/shortest-paths.html index 27dbf5b..d5f8ef5 100644 --- a/site/tags/shortest-paths.html +++ b/site/tags/shortest-paths.html @@ -179,7 +179,7 @@

Tag: shortest-paths

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/spanning trees.html b/site/tags/spanning trees.html index 1fbc263..1da22a9 100644 --- a/site/tags/spanning trees.html +++ b/site/tags/spanning trees.html @@ -179,7 +179,7 @@

Tag: spanning trees

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/stable matching.html b/site/tags/stable matching.html index 804e2ae..ffc8a8e 100644 --- a/site/tags/stable matching.html +++ b/site/tags/stable matching.html @@ -179,7 +179,7 @@

Tag: stable matching

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/synchronous vs asynchronous.html b/site/tags/synchronous vs asynchronous.html index c9b4bbc..f7e5ddf 100644 --- a/site/tags/synchronous vs asynchronous.html +++ b/site/tags/synchronous vs asynchronous.html @@ -179,7 +179,7 @@

Tag: synchronous vs asynchronous

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/time complexity.html b/site/tags/time complexity.html index f3a3dea..b1d1b67 100644 --- a/site/tags/time complexity.html +++ b/site/tags/time complexity.html @@ -179,7 +179,7 @@

Tag: time complexity

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/trees.html b/site/tags/trees.html index 99b675b..f0e18d6 100644 --- a/site/tags/trees.html +++ b/site/tags/trees.html @@ -179,7 +179,7 @@

Tag: trees

- Last modified: 2025-01-07 + Last modified: 2025-01-08
diff --git a/site/tags/vertex cover.html b/site/tags/vertex cover.html index 515283c..2e73e37 100644 --- a/site/tags/vertex cover.html +++ b/site/tags/vertex cover.html @@ -179,7 +179,7 @@

Tag: vertex cover

- Last modified: 2025-01-07 + Last modified: 2025-01-08