urls.json

{"https://arxiv.org/abs/1706.03762": {"title": ["Attention Is All You Need"], "abstract": ["The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.\n    "]}}
{"https://arxiv.org/pdf/2301.04655.pdf": {"title": [], "abstract": []}}
{"https://cacm.acm.org/blogs/blog-cacm/268103-what-do-chatgpt-and-ai-based-automatic-program-generation-mean-for-the-future-of-software/fulltext": {"title": [], "abstract": []}}
{"https://www-sciencedirect-com.proxy.library.vcu.edu/science/article/pii/S1471595322002517?utm_campaign=Healthcare%20Huddle%20--%201/22/23&utm_medium=email&utm_source=Sailthru&utm_term=Healthcare%20Huddle": {"title": [], "abstract": []}}
{"https://cacm.acm.org/news/268971-chatgpt-stole-your-work-so-what-are-you-going-to-do/fulltext": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2212.10496": {"title": ["Precise Zero-Shot Dense Retrieval without Relevance Labels"], "abstract": ["While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings~(HyDE). Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder~(e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity. This second step ground the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the incorrect details. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers, across various tasks (e.g. web search, QA, fact verification) and languages~(e.g. sw, ko, ja).\n    "]}}
{"https://arxiv.org/abs/2301.07597": {"title": ["How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection"], "abstract": ["The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at ", ".\n    "]}}
{"https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/use-natural-language-amp-prompts-with-ai-models-azure-openai/ba-p/3696839": {"title": ["\n\t\t"], "abstract": []}}
{"https://azure.microsoft.com/en-us/blog/general-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits/": {"title": [], "abstract": []}}
{"https://arxiv.org/pdf/2301.08653.pdf": {"title": [], "abstract": []}}
{"https://events.teams.microsoft.com/event/54ce5aec-50d4-446a-a4b2-2e962b5eda54@edded432-9905-4786-97c5-a1b1ca972100": {"title": [], "abstract": []}}
{"https://arxiv.org/ftp/arxiv/papers/2301/2301.01768.pdf": {"title": [], "abstract": []}}
{"https://www.sciencedirect.com/science/article/pii/S1471595322002517": {"title": [" There was a problem providing the content you requested\n                    "], "abstract": []}}
{"https://blogs.microsoft.com/blog/2023/01/23/microsoftandopenaiextendpartnership/": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2301.07098": {"title": ["The moral authority of ChatGPT"], "abstract": ["ChatGPT is not only fun to chat with, but it also searches information, answers questions, and gives advice. With consistent moral advice, it might improve the moral judgment and decisions of users, who often hold contradictory moral beliefs. Unfortunately, ChatGPT turns out highly inconsistent as a moral advisor. Nonetheless, it influences users' moral judgment, we find in an experiment, even if they know they are advised by a chatting bot, and they underestimate how much they are influenced. Thus, ChatGPT threatens to corrupt rather than improves users' judgment. These findings raise the question of how to ensure the responsible use of ChatGPT and similar AI. Transparency is often touted but seems ineffective. We propose training to improve digital literacy.\n    "]}}
{"https://arxiv.org/pdf/2203.02155.pdf": {"title": [], "abstract": []}}
{"https://blogs.microsoft.com/blog/2023/01/23/microsoftandopenaiextendpartnership/?WT.mc_id=M365-MVP-4029260": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2005.14165": {"title": ["Language Models are Few-Shot Learners"], "abstract": ["Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.\n    "]}}
{"http://cacm.acm.org/blogs/blog-cacm/269050-chatgpt-in-computer-science-education?utm_source=dlvr.it&utm_medium=twitter": {"title": [], "abstract": []}}
{"https://m-cacm.acm.org/blogs/blog-cacm/269050-chatgpt-in-computer-science-education/fulltext": {"title": [], "abstract": []}}
{"https://cacm.acm.org/blogs/blog-cacm/269050-chatgpt-in-computer-science-education/fulltext?utm_source=dlvr.it&utm_medium=twitter": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2204.02311": {"title": ["PaLM: Scaling Language Modeling with Pathways"], "abstract": ["Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.\n    "]}}
{"https://arxiv.org/abs/2107.03374": {"title": ["Evaluating Large Language Models Trained on Code"], "abstract": ["We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.\n    "]}}
{"https://arxiv.org/abs/2301.04655": {"title": ["ChatGPT is not all you need. A State of the Art Review of large Generative AI models"], "abstract": ["During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as being a general question and answering system or automatically creating artistic images that are revolutionizing several sectors. Consequently, the implications that these generative models have in the industry and society are enormous, as several job positions may be transformed. For example, Generative AI is capable of transforming effectively and creatively texts to images, like the DALLE-2 model; text to 3D images, like the Dreamfusion model; images to text, like the Flamingo model; texts to video, like the Phenaki model; texts to audio, like the AudioLM model; texts to other texts, like ChatGPT; texts to code, like the Codex model; texts to scientific texts, like the Galactica model or even create algorithms like AlphaTensor. This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.\n    "]}}
{"https://arxiv.org/pdf/2301.07597.pdf": {"title": [], "abstract": []}}
{"https://m-cacm.acm.org/blogs/blog-cacm/269050-chatgpt-in-computer-science-education/fulltext#.Y872z--Lld4.twitter": {"title": [], "abstract": []}}
{"https://news.microsoft.com/ja-jp/2023/01/23/230123-general-availability-of-azure-openai-service-expands-access-to-large-advanced-ai-models-with-added-enterprise-benefits/": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2301.01768": {"title": ["The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation"], "abstract": ["Conversational artificial intelligence (AI) disrupts how humans interact with technology. Recently, OpenAI introduced ChatGPT, a state-of-the-art dialogue model that can converse with its human counterparts with unprecedented capabilities. ChatGPT has witnessed tremendous attention from the media, academia, industry, and the general public, attracting more than a million users within days of its release. However, its explosive adoption for information search and as an automated decision aid underscores the importance to understand its limitations and biases. This paper focuses on one of democratic society's most important decision-making processes: political elections. Prompting ChatGPT with 630 political statements from two leading voting advice applications and the nation-agnostic political compass test in three pre-registered experiments, we uncover ChatGPT's pro-environmental, left-libertarian ideology. For example, ChatGPT would impose taxes on flights, restrict rent increases, and legalize abortion. In the 2021 elections, it would have voted most likely for the Greens both in Germany (B\u00fcndnis 90/Die Gr\u00fcnen) and in the Netherlands (GroenLinks). Our findings are robust when negating the prompts, reversing the order of the statements, varying prompt formality, and across languages (English, German, Dutch, and Spanish). We conclude by discussing the implications of politically biased conversational AI on society.\n    "]}}
{"https://arxiv.org/abs/2301.06627": {"title": ["Dissociating language and thought in large language models"], "abstract": ["Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence--knowledge of linguistic rules and patterns--and functional linguistic competence--understanding and using language in the world. We ground this distinction in human neuroscience, showing that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. In short, LLMs are good models of language but incomplete models of human thought.\n    "]}}
{"https://arxiv.org/abs/2203.02155": {"title": ["Training language models to follow instructions with human feedback"], "abstract": ["Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.\n    "]}}
{"http://arxiv.org/abs/2301.07597": {"title": ["How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection"], "abstract": ["The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at ", ".\n    "]}}
{"https://blogs.microsoft.com/blog/2023/01/23/microsoftandopenaiextendpartnership/#:~:text=Today%2C%20we%20are%20announcing%20the,investments%20in%202019%20and%202021": {"title": [], "abstract": []}}
{"https://arxiv.org/abs/2209.14792": {"title": ["Make-A-Video: Text-to-Video Generation without Text-Video Data"], "abstract": ["We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules. First, we decompose the full temporal U-Net and attention tensors and approximate them in space and time. Second, we design a spatial temporal pipeline to generate high resolution and frame rate videos with a video decoder, interpolation model and two super resolution models that can enable various applications besides T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.\n    "]}}
{"https://cacm.acm.org/blogs/blog-cacm/269050-chatgpt-in-computer-science-education/fulltext": {"title": [], "abstract": []}}
{"http://arxiv.org/abs/2301.08745": {"title": ["Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine"], "abstract": ["This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well with minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but exhibits good results on spoken language. Further, we explore an interesting strategy named $\\mathbf{pivot~prompting}$ for distant languages, which asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, improving the translation performance noticeably. With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages. Human analysis on Google Translate and ChatGPT suggests that ChatGPT with GPT-3.5 tends to generate more hallucinations and mis-translation errors while that with GPT-4 makes the least errors. In other words, ChatGPT has already become a good translator. Please refer to our Github project for more details: ", "\n    "]}}
{"https://arxiv.org/abs/2301.08745": {"title": ["Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine"], "abstract": ["This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well with minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but exhibits good results on spoken language. Further, we explore an interesting strategy named $\\mathbf{pivot~prompting}$ for distant languages, which asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, improving the translation performance noticeably. With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages. Human analysis on Google Translate and ChatGPT suggests that ChatGPT with GPT-3.5 tends to generate more hallucinations and mis-translation errors while that with GPT-4 makes the least errors. In other words, ChatGPT has already become a good translator. Please refer to our Github project for more details: ", "\n    "]}}
{"https://www.sciencedirect.com/science/article/abs/pii/S1471595322002517?utm_source=content_newsletter_mailer&utm_medium=email&utm_campaign=newsletter": {"title": [" There was a problem providing the content you requested\n                    "], "abstract": []}}
{"https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu": {"title": [], "abstract": []}}
{"https://journals.sagepub.com/doi/full/10.1177/01634437221147626": {"title": [], "abstract": []}}