rstudio
diff --git a/‎_freeze/html/nlp-with-llms/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/html/nlp-with-llms/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎html/images/logo-mall.png‎
-82.8 KB b/‎html/images/logo-mall.png‎
-82.8 KB
diff --git a/‎html/images/mall-chatlas.png‎
403 KB b/‎html/images/mall-chatlas.png‎
403 KB
diff --git a/‎html/images/mall-ellmer.png‎
545 KB b/‎html/images/mall-ellmer.png‎
545 KB
@@ -1,8 +1,8 @@
 {
-  "hash": "0ccad112eeb7958f91f8f5f40ea4887d",
+  "hash": "308a74391211360772dd1084c9c454e0",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: \"Natural Language Processing using LLMs in R & Python :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n  eval: true\n  output: false\n  warning: false\n---\n\n<img src=\"images/logo-mall.png\" height=\"138\" alt=\"Hex logo for mall - drawing of the inside of a mall. There is a fountain in the middle.\"> <br><br><a href=\"../nlp-with-llms.pdf\">\n\n\n::: {.cell .column-margin}\n<a href=\"../nlp-with-llms.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/nlp-with-llms.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br>\n:::\n\n\n\nWill be updated soon!\n",
+    "markdown": "---\ntitle: \"Natural Language Processing using LLMs in R & Python :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n  eval: true\n  output: false\n  warning: false\n---\n\n\n::: {.cell .column-margin}\n<img src=\"images/logo-mall.png\" height=\"138\" alt=\"Hex logo for mall - drawing of the inside of a mall. There is a fountain in the middle.\" />\n<br><br><a href=\"../nlp-with-llms.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/nlp-with-llms.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br>\n:::\n\n\n::: {style=\"text-align: center;\"}  \n*Click on* **R** *or the* **Python** *tab to see the information in your preferred language*\n:::\n\n## Intro\n\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\nUse LLM’s to perform NLP row-wise over a data frame. `mall` comes with pre-defined\nprompts that perform specific NLP operations, and then places the results in a \nnew column. Use *OpenAI*, *Ollama*, *Anthropic* and many others thanks to its\nintegration with `ellmer`\n\n`mall`’s data frame functions are designed with ‘tidy’ principals in mind, so they\nwork with the Tidyverse packages. `mall` also includes functions that work with \nstring vectors.\n\n\n![](images/mall-ellmer.png){width=\"500\" fig-align=\"center\" fig-alt=\"A diagram showing how mall adds a prompt and passes it to the LLM via ellmer\"}\n\n\n\n\n### Python\n\nUse LLM’s to perform NLP row-wise over a data frame. `mall` comes with pre-defined\nprompts that perform specific NLP operations, and then places the results in a \nnew column. Use *OpenAI*, *Ollama*, *Anthropic* and many others thanks to its \nintegration with `chatlas`\n\n`mall` works as an extension for **Polars** data frames. It also works with\nstring vectors.\n\n![](images/mall-chatlas.png){width=\"500\" fig-align=\"center\" fig-alt=\"A diagram showing how mall adds a prompt and passes it to the LLM via chatlas\"}\n\n:::\n\n## Getting started\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n1. Load the libraries\n    \n    ```r\n    library(mall)\n    library(ellmer)\n    ```\n    \n1. Create a vendor specific chat connection\n\n    ```r\n    chat <- chat_openai()\n    ```\n\n1. Pass the chat object to mall\n\n    ```r\n    llm_use(chat)\n    ````\n    \n- *For dataframes:*\n\n    ```r\n    data(“reviews\") # Sample product reviews\n    \n    reviews |>\n      llm_sentiment(review)\n    ```\n\n- *For vectors:*\n    \n    ```r\n    llm_vec_sentiment(c(\"I am happy\", \"I am sad”))\n    ```\n\n<br/>\n\n::: {style=\"font-size: 130%;\"}    \n**Connect automatically**\n:::\n\nAs a convenience, mall is able to automatically establish a connection with the \nLLM. To do this you can use the `.mall_chat` option: \n`options(.mall_chat=ellmer::chat_openai(model=\"gpt-4o\"))`\nAdd this line to your *.Rprofile * file in order for that code to run every \ntime you start R. You can call `usethis::edit_r_profile()` to edit.\n\n### Python\n\nStart by creating a new LLM connection\n\n```python\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI()\n```\n\n- *For Dataframes*\n\n  1. Load the library\n  \n      ```python\n      import mall\n      ```\n      \n  1. Read or load your data frame\n  \n      ```python\n      reviews = mall.MallData.reviews # Sample product\n      reviews\n      ```\n  1. Pass the chat object to mall\n  \n      ```python\n      reviews.llm.use(chat)\n      ```\n      \n  1. Access NLP functions via `.llm`\n      \n      ```python\n      reviews.llm.sentiment('review')\n      ```\n    \n- *For String vectors*\n\n  1. Load the LLMVec class\n  \n      ```python\n      from mall import LLMVec\n      ```\n  1. Create a new LLMVec object\n  \n      ```python\n      llm = LLMVec(chat)\n      ```\n      \n  1. Pass a vector to a function in the new object\n  \n      ```python\n      llm.sentiment(['I am happy', 'I am sad'])\n      ```\n    \n:::\n\n## NLP functions\n\n\n### Sentiment analysis \n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_sentiment(.data, col, options = c(\"positive\", \"negative\", “neutral\"), pred_name = “.sentiment\", additional_prompt = “”)`\n\n    ```r\n    llm_sentiment(reviews, review)\n    ```\n\n- `llm_vec_sentiment(x, options = c(\"positive\", \"negative\", \"neutral\"), additional_prompt = \"\", preview = FALSE)`\n\n    ```r\n    llm_vec_sentiment(c(\"I am happy\", \"I am sad\"))\n    ```\n\n**Special arguments:**\n\n`options`: Customize the sentiments to check for: \n`options = c(“positive”, “negative”)`. Use ‘tilde’ to mask the results, for \nexample `c(\"positive\" ~ 1, \"negative\" ~ 0))` returns 1 for positive and 0 for\nnegative.\n\n### Python\n\n- *\\<Dataframe\\>*`.llm.sentiment(col, options = ['positive', 'negative', 'neutral'], additional='', pred_name ='sentiment')`\n\n    ```python\n    reviews.llm.sentiment('review')\n    ```\n\n- *\\<LLMVec object\\>*`.sentiment(x, options=['positive', 'negative', 'neutral'], additional='')`\n\n    ```python\n    llm.sentiment(['I am happy', 'I am sad'])\n    ```\n\n**Special arguments:**\n\n`options`:  Customize the sentiments to check for: \n`options = [\"positive\", \"negative\"]`. Use a DICT object to mask the results, \nfor example `{\"positive\": 1, \"negative\"  0}` returns 1 for positive and 0 for\nnegative.\n\n\n:::\n\n\n### Extract\n\nExtract specific entity, or entities, from the provided text\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_extract(.data, col, labels, expand_cols = FALSE, additional_prompt = \"\", pred_name = “.extract\")`\n\n    ```r\n    llm_extract(reviews, review, labels = \"product\")\n    ```\n    \n- `llm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE)`\n\n    ```r\n    llm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\"))\n    ```\n    \n  \n**Special arguments**\n\n`labels`:  A vector to specify the entities to identify `expand_cols` - If \nmultiple labels, this indicates if the labels will show up in their own \ncolumn (data frames only)\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.extract(col, labels='', expand_cols = False, additional = '', pred_name = 'extract')`  \n\n    ```python\n    reviews.llm.extract(\"review\", labels = \"product\")\n    ```\n\n- *\\<LLMVec object\\>*`.extract(x, labels='', additional='') `\n\n    ```python\n    llm.extract(['bob smith, 123 3rd street'], labels=['name', 'address'])\n    ```\n\n**Special arguments**\n\n\n`labels`: A vector to specify the entities to identify `expand_cols` - If \nmultiple labels, this indicates if the labels will show up in their own\ncolumn (data frames only)\n\n:::\n\n\n### Summarize\n\nSummarize text into a specified number of words\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_summarize( .data, col, max_words = 10, pred_name = \".summary\", additional_prompt = \"\")`\n\n    ```r\n    llm_summarize(reviews, review, max_words = 5)\n    ```\n    \n- `llm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE)`\n\n    ```r\n    llm_vec_summarize(\"This has been the best TV I've\n    ever used. Great screen, and sound.\", max_words = 5)\n    ```\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.summarize(x, max_words=10, additional='')`\n\n    ```python\n    reviews.llm.summarize(\"review\", 5)\n    ```\n    \n- *\\<LLMVec object\\>*`.summarize(x, max_words=10, additional='')`\n\n    ```python\n    llm.summarize(['This has been the best TV Ive ever used. Great screen, and sound.'], max_words = 5)\n    ```\n:::\n\n\n\n### Verify\n\nCheck if a statement is true or not based on the provided text\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_verify(.data, col, what, yes_no = factor(c(1, 0)), pred_name = \".verify\", additional_prompt = \"\")`\n\n    ```r\n    llm_verify(reviews, review, \"is the customer happy\")\n    ```\n\n- `llm_vec_verify(x, what, yes_no = factor(c(1, 0)), additional_prompt = “\", preview = FALSE)`\n\n    ```r\n    llm_verify(c(\"I am happy\", \"I am sad\"), \"is the person happy”)\n    ```\n\n**Special arguments**\n\n`yes_no`: Customize what it returns for true/false with a vector\n`yes_no = c(\"y\", \"n\")`.\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')`\n    \n    ```python\n    reviews.llm.verify(\"review\", \"is the customer happy\")\n    ```\n    \n- *\\<LLMVec object\\>*`.verify(x, what='', yes_no=[1, 0], additional='')`\n\n    ```python\n    llm.verify(['I am happy', 'I am sad'], what = 'Is the person happy?')\n    ```\n\n**Special arguments**\n\n`yes_no`: Customize what it returns for true/false with a vector\n`yes_no = [\"y\", \"n\"]`.\n\n:::\n\n\n### Classify\n\nClassify the provided text as one of the options provided via the `labels`\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_classify(.data, col, labels, pred_name = \".classify\", additional_prompt = \"\")`\n\n    ```r\n    llm_classify(reviews, review, c(\"appliance\", \"computer\"))\n    ```\n\n- `llm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE)`\n\n    ```r\n    llm_vec_classify(c(\"this is important!\", \"just whenever”), c(\"urgent\", \"not urgent\"))\n    ```\n\n**Special arguments**\n\n`labels`: A character vector with at least 2 labels to classify the text as\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.classify(col, labels='', additional='', pred_name='classify')`\n    \n    ```python\n    reviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n    ```\n    \n- *\\<LLMVec object\\>*`.classify(x, labels='', additional='')`\n\n    ```python\n    llm.classify(['this is important!', 'there is no rush'], ['urgent', 'not urgent'])\n    ```\n    \n**Special arguments**\n\n`labels`: A character vector with at least 2 labels to classify the text as\n\n:::\n\n### Translate\n\nTranslate into target language\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_translate( .data, col, language, pred_name = \".translation\", additional_prompt = “\")`\n\n    ```r\n    llm_translate(reviews, review, \"spanish\")\n    ```\n\n- `llm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE)`\n\n    ```r\n    llm_vec_translate(\"grass is green\", \"spanish\")\n    ```\n\n**Special arguments**\n\n`language`: Target language. No origin language is passed since the LLM \ndetects it automatically.\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.translate(col, language='', additional='', pred_name='translation')`\n\n    ``` python\n    reviews.llm.translate(\"review\", \"spanish\")\n    ```\n\n- *\\<LLMVec object\\>*`.translate(x, language='', additional='')`\n\n    ```python\n    llm.translate([‘the grass is green’], language = 'spanish') \n    ```\n\n**Special arguments**\n\n`language`: Target language. No origin language is passed since the LLM detects \nit automatically.\n\n:::\n\n\n### Custom\n\nCreate your own prompt\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\")`\n    \n    ```r\n    my_prompt <- \"Answer a question. Return only the\n    answer, no explanation. Acceptable answers are 'yes',\n    ‘no’. Answer this about the following text, is this a\n    happy customer?:\"\n    llm_custom(reviews , review, my_prompt)\n    ```\n    \n- `llm_vec_custom(x, prompt = \"\", valid_resps = NULL)`\n\n**Special arguments**\n\n`valid_resps`:  A vector to specify the set of answers expected back. `mall` \nwill change those not in the set to `NA`\n\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\n    ```python\n    my_prompt = \"Answer a question. Return only the answer,\" \\\n        \" no explanation. Acceptable answers are 'yes', 'no'.\" \\\n        \"Answer this about the following text, is this a happy customer?:\"\n    reviews.llm.custom(\"review\", prompt = my_prompt)\n    ```\n\n- *\\<LLMVec object\\>*`.custom(x, prompt='', valid_resps='')`\n\n\n**Special arguments**\n\n`valid_resps`:  A vector to specify the set of answers expected back.\n\n\n:::\n\n\n### Shared arguments\n\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n`additional_prompt`:  Appends instructions to the LLM.\n\n```r\nllm_classify(reviews, review, c(\"appliance\", \"computers\"), additional_prompt = \"Consider TVs as appliances\")\n```\n\n`pred_name`: Name of the new column. Defaults are set based on the NLP \noperation. `(Data frames only)`\n\n```r\nllm_vec_translate(\"grass is green\", \"spanish\", pred_name = \"in_spanish\")\n```\n\n`preview`: Returns what it would be sent to the LLM instead *(Vectors only)*\n\n### Python\n\n\n`additional`:  Appends more instructions to the LLM.\n\n```python\nreviews.llm.classify(\"review\", [\"appliance\", “computer\"],additional=\"Consider TVs as appliances”)\n```\n\n`pred_name`:  Name of the new column. Defaults are set based on the NLP \noperation. *(Data frames only)*\n\n:::\n\n## Other features\n\n### Ollama direct\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n\nIf Ollama is the only LLM provider you are using, then a\nsimplified way to connect is available which does not\nrequire an ellmer Chat object. Simply pass “ollama”\nas the `backend`, and specify the model:\n\n```r\nllm_use(\"ollama\", model= “llama3.2\")\n```\n\n### Python\n\n\nIf Ollama is the only LLM provider you are using, then a simplified way to \nconnect is available which does not require an `ellmer` `Chat` object. Simply \npass \"ollama\" as the `backend`, and specify the model:\n\n- Dataframe:\n\n    ```python\n    reviews.llm.use('ollama','llama3.2')\n    ```\n    \n- Vector:\n\n    ```python\n    llm = LLMVec('ollama','llama3.2')\n    ```\n:::\n\n### Caching results\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\nBy default, mall saves the LLM results in a temp folder. To specify a folder \ncall:\n\n```r\nllm_use(chat, .cache = \"<my folder>\")\n```\n    \nTo turn **off** use:\n\n```r\nllm_use(chat, .cache = \"\")\n```\n\n### Python\n\nBy default, mall saves the LLM results in a folder. To specify a folder call:\n\n- Dataframe:\n\n    ```python\n    reviews.llm.use(chat, _cache='<my folder>')\n    ```\n- Vector:\n\n    ```python\n    llm = LLMVec(chat, _cache='<my folder>')\n    ```\n\nTo turn off use:\n\n- Dataframe:\n\n    ```python\n    reviews.llm.use(chat, _cache='')\n    ```\n\n- Vector:\n\n    ```python\n    llm = LLMVec(chat, _cache='')\n    ```\n    \n:::",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"