Skip to content

Commit 007a74c

Browse files
Merge pull request #583 from rstudio/mall-qmd
Adds {mall} html
2 parents 4c295a6 + eb5c2a7 commit 007a74c

File tree

5 files changed

+605
-5
lines changed

5 files changed

+605
-5
lines changed

_freeze/html/nlp-with-llms/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
{
2-
"hash": "0ccad112eeb7958f91f8f5f40ea4887d",
2+
"hash": "308a74391211360772dd1084c9c454e0",
33
"result": {
44
"engine": "knitr",
5-
"markdown": "---\ntitle: \"Natural Language Processing using LLMs in R & Python :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n<img src=\"images/logo-mall.png\" height=\"138\" alt=\"Hex logo for mall - drawing of the inside of a mall. There is a fountain in the middle.\"> <br><br><a href=\"../nlp-with-llms.pdf\">\n\n\n::: {.cell .column-margin}\n<a href=\"../nlp-with-llms.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/nlp-with-llms.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br>\n:::\n\n\n\nWill be updated soon!\n",
5+
"markdown": "---\ntitle: \"Natural Language Processing using LLMs in R & Python :: Cheatsheet\"\ndescription: \" \"\nimage-alt: \"\"\nexecute:\n eval: true\n output: false\n warning: false\n---\n\n\n::: {.cell .column-margin}\n<img src=\"images/logo-mall.png\" height=\"138\" alt=\"Hex logo for mall - drawing of the inside of a mall. There is a fountain in the middle.\" />\n<br><br><a href=\"../nlp-with-llms.pdf\">\n<p><i class=\"bi bi-file-pdf\"></i> Download PDF</p>\n<img src=\"../pngs/nlp-with-llms.png\" width=\"200\" alt=\"\"/>\n</a>\n<br><br>\n:::\n\n\n::: {style=\"text-align: center;\"} \n*Click on* **R** *or the* **Python** *tab to see the information in your preferred language*\n:::\n\n## Intro\n\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\nUse LLM’s to perform NLP row-wise over a data frame. `mall` comes with pre-defined\nprompts that perform specific NLP operations, and then places the results in a \nnew column. Use *OpenAI*, *Ollama*, *Anthropic* and many others thanks to its\nintegration with `ellmer`\n\n`mall`’s data frame functions are designed with ‘tidy’ principals in mind, so they\nwork with the Tidyverse packages. `mall` also includes functions that work with \nstring vectors.\n\n\n![](images/mall-ellmer.png){width=\"500\" fig-align=\"center\" fig-alt=\"A diagram showing how mall adds a prompt and passes it to the LLM via ellmer\"}\n\n\n\n\n### Python\n\nUse LLM’s to perform NLP row-wise over a data frame. `mall` comes with pre-defined\nprompts that perform specific NLP operations, and then places the results in a \nnew column. Use *OpenAI*, *Ollama*, *Anthropic* and many others thanks to its \nintegration with `chatlas`\n\n`mall` works as an extension for **Polars** data frames. It also works with\nstring vectors.\n\n![](images/mall-chatlas.png){width=\"500\" fig-align=\"center\" fig-alt=\"A diagram showing how mall adds a prompt and passes it to the LLM via chatlas\"}\n\n:::\n\n## Getting started\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n1. Load the libraries\n \n ```r\n library(mall)\n library(ellmer)\n ```\n \n1. Create a vendor specific chat connection\n\n ```r\n chat <- chat_openai()\n ```\n\n1. Pass the chat object to mall\n\n ```r\n llm_use(chat)\n ````\n \n- *For dataframes:*\n\n ```r\n data(“reviews\") # Sample product reviews\n \n reviews |>\n llm_sentiment(review)\n ```\n\n- *For vectors:*\n \n ```r\n llm_vec_sentiment(c(\"I am happy\", \"I am sad”))\n ```\n\n<br/>\n\n::: {style=\"font-size: 130%;\"} \n**Connect automatically**\n:::\n\nAs a convenience, mall is able to automatically establish a connection with the \nLLM. To do this you can use the `.mall_chat` option: \n`options(.mall_chat=ellmer::chat_openai(model=\"gpt-4o\"))`\nAdd this line to your *.Rprofile * file in order for that code to run every \ntime you start R. You can call `usethis::edit_r_profile()` to edit.\n\n### Python\n\nStart by creating a new LLM connection\n\n```python\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI()\n```\n\n- *For Dataframes*\n\n 1. Load the library\n \n ```python\n import mall\n ```\n \n 1. Read or load your data frame\n \n ```python\n reviews = mall.MallData.reviews # Sample product\n reviews\n ```\n 1. Pass the chat object to mall\n \n ```python\n reviews.llm.use(chat)\n ```\n \n 1. Access NLP functions via `.llm`\n \n ```python\n reviews.llm.sentiment('review')\n ```\n \n- *For String vectors*\n\n 1. Load the LLMVec class\n \n ```python\n from mall import LLMVec\n ```\n 1. Create a new LLMVec object\n \n ```python\n llm = LLMVec(chat)\n ```\n \n 1. Pass a vector to a function in the new object\n \n ```python\n llm.sentiment(['I am happy', 'I am sad'])\n ```\n \n:::\n\n## NLP functions\n\n\n### Sentiment analysis \n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_sentiment(.data, col, options = c(\"positive\", \"negative\", “neutral\"), pred_name = “.sentiment\", additional_prompt = “”)`\n\n ```r\n llm_sentiment(reviews, review)\n ```\n\n- `llm_vec_sentiment(x, options = c(\"positive\", \"negative\", \"neutral\"), additional_prompt = \"\", preview = FALSE)`\n\n ```r\n llm_vec_sentiment(c(\"I am happy\", \"I am sad\"))\n ```\n\n**Special arguments:**\n\n`options`: Customize the sentiments to check for: \n`options = c(“positive”, “negative”)`. Use ‘tilde’ to mask the results, for \nexample `c(\"positive\" ~ 1, \"negative\" ~ 0))` returns 1 for positive and 0 for\nnegative.\n\n### Python\n\n- *\\<Dataframe\\>*`.llm.sentiment(col, options = ['positive', 'negative', 'neutral'], additional='', pred_name ='sentiment')`\n\n ```python\n reviews.llm.sentiment('review')\n ```\n\n- *\\<LLMVec object\\>*`.sentiment(x, options=['positive', 'negative', 'neutral'], additional='')`\n\n ```python\n llm.sentiment(['I am happy', 'I am sad'])\n ```\n\n**Special arguments:**\n\n`options`: Customize the sentiments to check for: \n`options = [\"positive\", \"negative\"]`. Use a DICT object to mask the results, \nfor example `{\"positive\": 1, \"negative\" 0}` returns 1 for positive and 0 for\nnegative.\n\n\n:::\n\n\n### Extract\n\nExtract specific entity, or entities, from the provided text\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_extract(.data, col, labels, expand_cols = FALSE, additional_prompt = \"\", pred_name = “.extract\")`\n\n ```r\n llm_extract(reviews, review, labels = \"product\")\n ```\n \n- `llm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE)`\n\n ```r\n llm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\"))\n ```\n \n \n**Special arguments**\n\n`labels`: A vector to specify the entities to identify `expand_cols` - If \nmultiple labels, this indicates if the labels will show up in their own \ncolumn (data frames only)\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.extract(col, labels='', expand_cols = False, additional = '', pred_name = 'extract')` \n\n ```python\n reviews.llm.extract(\"review\", labels = \"product\")\n ```\n\n- *\\<LLMVec object\\>*`.extract(x, labels='', additional='') `\n\n ```python\n llm.extract(['bob smith, 123 3rd street'], labels=['name', 'address'])\n ```\n\n**Special arguments**\n\n\n`labels`: A vector to specify the entities to identify `expand_cols` - If \nmultiple labels, this indicates if the labels will show up in their own\ncolumn (data frames only)\n\n:::\n\n\n### Summarize\n\nSummarize text into a specified number of words\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_summarize( .data, col, max_words = 10, pred_name = \".summary\", additional_prompt = \"\")`\n\n ```r\n llm_summarize(reviews, review, max_words = 5)\n ```\n \n- `llm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE)`\n\n ```r\n llm_vec_summarize(\"This has been the best TV I've\n ever used. Great screen, and sound.\", max_words = 5)\n ```\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.summarize(x, max_words=10, additional='')`\n\n ```python\n reviews.llm.summarize(\"review\", 5)\n ```\n \n- *\\<LLMVec object\\>*`.summarize(x, max_words=10, additional='')`\n\n ```python\n llm.summarize(['This has been the best TV Ive ever used. Great screen, and sound.'], max_words = 5)\n ```\n:::\n\n\n\n### Verify\n\nCheck if a statement is true or not based on the provided text\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_verify(.data, col, what, yes_no = factor(c(1, 0)), pred_name = \".verify\", additional_prompt = \"\")`\n\n ```r\n llm_verify(reviews, review, \"is the customer happy\")\n ```\n\n- `llm_vec_verify(x, what, yes_no = factor(c(1, 0)), additional_prompt = “\", preview = FALSE)`\n\n ```r\n llm_verify(c(\"I am happy\", \"I am sad\"), \"is the person happy”)\n ```\n\n**Special arguments**\n\n`yes_no`: Customize what it returns for true/false with a vector\n`yes_no = c(\"y\", \"n\")`.\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')`\n \n ```python\n reviews.llm.verify(\"review\", \"is the customer happy\")\n ```\n \n- *\\<LLMVec object\\>*`.verify(x, what='', yes_no=[1, 0], additional='')`\n\n ```python\n llm.verify(['I am happy', 'I am sad'], what = 'Is the person happy?')\n ```\n\n**Special arguments**\n\n`yes_no`: Customize what it returns for true/false with a vector\n`yes_no = [\"y\", \"n\"]`.\n\n:::\n\n\n### Classify\n\nClassify the provided text as one of the options provided via the `labels`\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_classify(.data, col, labels, pred_name = \".classify\", additional_prompt = \"\")`\n\n ```r\n llm_classify(reviews, review, c(\"appliance\", \"computer\"))\n ```\n\n- `llm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE)`\n\n ```r\n llm_vec_classify(c(\"this is important!\", \"just whenever”), c(\"urgent\", \"not urgent\"))\n ```\n\n**Special arguments**\n\n`labels`: A character vector with at least 2 labels to classify the text as\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.classify(col, labels='', additional='', pred_name='classify')`\n \n ```python\n reviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n ```\n \n- *\\<LLMVec object\\>*`.classify(x, labels='', additional='')`\n\n ```python\n llm.classify(['this is important!', 'there is no rush'], ['urgent', 'not urgent'])\n ```\n \n**Special arguments**\n\n`labels`: A character vector with at least 2 labels to classify the text as\n\n:::\n\n### Translate\n\nTranslate into target language\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_translate( .data, col, language, pred_name = \".translation\", additional_prompt = “\")`\n\n ```r\n llm_translate(reviews, review, \"spanish\")\n ```\n\n- `llm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE)`\n\n ```r\n llm_vec_translate(\"grass is green\", \"spanish\")\n ```\n\n**Special arguments**\n\n`language`: Target language. No origin language is passed since the LLM \ndetects it automatically.\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.translate(col, language='', additional='', pred_name='translation')`\n\n ``` python\n reviews.llm.translate(\"review\", \"spanish\")\n ```\n\n- *\\<LLMVec object\\>*`.translate(x, language='', additional='')`\n\n ```python\n llm.translate([‘the grass is green’], language = 'spanish') \n ```\n\n**Special arguments**\n\n`language`: Target language. No origin language is passed since the LLM detects \nit automatically.\n\n:::\n\n\n### Custom\n\nCreate your own prompt\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n- `llm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\")`\n \n ```r\n my_prompt <- \"Answer a question. Return only the\n answer, no explanation. Acceptable answers are 'yes',\n ‘no’. Answer this about the following text, is this a\n happy customer?:\"\n llm_custom(reviews , review, my_prompt)\n ```\n \n- `llm_vec_custom(x, prompt = \"\", valid_resps = NULL)`\n\n**Special arguments**\n\n`valid_resps`: A vector to specify the set of answers expected back. `mall` \nwill change those not in the set to `NA`\n\n\n### Python\n\n- *\\<DataFrame\\>*`.llm.custom(col, prompt='', valid_resps='', pred_name='custom')`\n\n ```python\n my_prompt = \"Answer a question. Return only the answer,\" \\\n \" no explanation. Acceptable answers are 'yes', 'no'.\" \\\n \"Answer this about the following text, is this a happy customer?:\"\n reviews.llm.custom(\"review\", prompt = my_prompt)\n ```\n\n- *\\<LLMVec object\\>*`.custom(x, prompt='', valid_resps='')`\n\n\n**Special arguments**\n\n`valid_resps`: A vector to specify the set of answers expected back.\n\n\n:::\n\n\n### Shared arguments\n\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n`additional_prompt`: Appends instructions to the LLM.\n\n```r\nllm_classify(reviews, review, c(\"appliance\", \"computers\"), additional_prompt = \"Consider TVs as appliances\")\n```\n\n`pred_name`: Name of the new column. Defaults are set based on the NLP \noperation. `(Data frames only)`\n\n```r\nllm_vec_translate(\"grass is green\", \"spanish\", pred_name = \"in_spanish\")\n```\n\n`preview`: Returns what it would be sent to the LLM instead *(Vectors only)*\n\n### Python\n\n\n`additional`: Appends more instructions to the LLM.\n\n```python\nreviews.llm.classify(\"review\", [\"appliance\", “computer\"],additional=\"Consider TVs as appliances”)\n```\n\n`pred_name`: Name of the new column. Defaults are set based on the NLP \noperation. *(Data frames only)*\n\n:::\n\n## Other features\n\n### Ollama direct\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\n\nIf Ollama is the only LLM provider you are using, then a\nsimplified way to connect is available which does not\nrequire an ellmer Chat object. Simply pass “ollama”\nas the `backend`, and specify the model:\n\n```r\nllm_use(\"ollama\", model= “llama3.2\")\n```\n\n### Python\n\n\nIf Ollama is the only LLM provider you are using, then a simplified way to \nconnect is available which does not require an `ellmer` `Chat` object. Simply \npass \"ollama\" as the `backend`, and specify the model:\n\n- Dataframe:\n\n ```python\n reviews.llm.use('ollama','llama3.2')\n ```\n \n- Vector:\n\n ```python\n llm = LLMVec('ollama','llama3.2')\n ```\n:::\n\n### Caching results\n\n::: {.panel-tabset group=\"language\"}\n\n### R\n\nBy default, mall saves the LLM results in a temp folder. To specify a folder \ncall:\n\n```r\nllm_use(chat, .cache = \"<my folder>\")\n```\n \nTo turn **off** use:\n\n```r\nllm_use(chat, .cache = \"\")\n```\n\n### Python\n\nBy default, mall saves the LLM results in a folder. To specify a folder call:\n\n- Dataframe:\n\n ```python\n reviews.llm.use(chat, _cache='<my folder>')\n ```\n- Vector:\n\n ```python\n llm = LLMVec(chat, _cache='<my folder>')\n ```\n\nTo turn off use:\n\n- Dataframe:\n\n ```python\n reviews.llm.use(chat, _cache='')\n ```\n\n- Vector:\n\n ```python\n llm = LLMVec(chat, _cache='')\n ```\n \n:::",
66
"supporting": [],
77
"filters": [
88
"rmarkdown/pagebreak.lua"

html/images/logo-mall.png

-82.8 KB
Loading

html/images/mall-chatlas.png

403 KB
Loading

html/images/mall-ellmer.png

545 KB
Loading

0 commit comments

Comments
 (0)