Skip to content

πŸ‘ LLMs

Abstract

This chapter presents brief overivew of some of the popular research papers related to Generative AI and Prompt Engineering.

[1] Language Models are Few-Shot Learners (GPT-3)

Overview: The paper presents the significant advancements in natural language processing (NLP) achieved by scaling up language models like GPT-3, which contains 175 billion parameters, and achieves remarkable task-agnostic, few-shot performance without fine-tuning or gradient updates.

GPT-3 exhibits strong performance across various NLP tasks, including translation, question-answering, and reasoning tasks, even achieving competitive results with fine-tuning approaches.

Nevertheless, GPT-3 encounters challenges in specific datasets, and the paper underscores vital societal concerns about GPT-3's capacity to generate human-like news articles, emphasizing the need for ethical considerations in its deployment.

[2] GPT-4 Technical Report (GPT-4)

Overview: This paper presents GPT-4, a multimodal model with the ability to handle both textual and visual inputs while producing textual outputs. GPT-4 showcases remarkable performance comparable to human-level performance across several professional and academic evaluation benchmarks. For instance, it achieves a ranking in the top 10% during a simulated bar exam.

[3] Gemini: A Family of Highly Capable Multimodal Models (Gemini)

Overview: This research paper introduces a series of multimodal models under the Gemini family, comprising Ultra, Pro, and Nano variants. Notably, the Gemini Ultra model sets new performance standards, surpassing human experts in 30 out of 32 benchmarks.

The paper underscores the potential of Gemini models in cross-modal reasoning and language understanding while highlighting the importance of responsible deployment.

[4] An In-depth Look at Gemini’s Language Abilities (Gemini)

Overview: This paper explores Google's Gemini models, comparing them with OpenAI’s LLMs across various language tasks. The authors offer a transparent, reproducible analysis of both models on 10 datasets, including math problem-solving, language translation, and code generation.

The findings indicate that Gemini Pro exhibits a slightly lower level of accuracy when compared to GPT 3.5 Turbo in these tasks and highlight areas of underperformance such as mathematical reasoning and answer ordering sensitivity.

However, Gemini exhibits strengths in generating non-English languages and handling complex reasoning chains.

[5] Llama 2: Open Foundation and Fine-Tuned Chat Models (Llama 2)

Overview: In this paper, the authors introduce Llama 2, a collection of LLMs with varying scales from 7 billion to 70 billion parameters.

They focus on optimizing their fine-tuned LLMs, known as Llama 2-Chat, for dialogue applications, showcasing superior performance on various benchmarks and suggesting their potential as open-source alternatives to proprietary counterpart models.

[6] LLM360: Towards Fully Transparent Open-Source LLMs (LLM 360)

Overview: The paper introduces LLM360, an initiative aimed at fully open-sourcing LLMs to enhance transparency and collaboration in AI research. It emphasizes the importance of sharing complete training code, data, model checkpoints, and intermediate results with the community, addressing the limitations of prior LLM releases.

As a first step, the authors release two 7B parameter LLMs, Amber and CrystalCoder, along with their associated resources, and express their commitment to developing more advanced LLMs through this open-source initiative in the future.

[7] The Falcon Series of Open Language Models (Falcon)

Overview: The paper introduces a series of language models known as the Falcon series, comprising Falcon-7B, Falcon-40B, and Falcon-180B, which are causal decoder-only models trained on a diverse high-quality dataset assembled from web data, with Falcon-180B being the largest model trained on over 3.5 trillion tokens.

Falcon-180B surpasses models like PaLM and Chinchilla and approaches the performance of PaLM-2-Large while requiring less pretraining and inference resources, making it one of the top three language models globally alongside GPT-4 and PaLM-2-Large.