GLM-130B: An Open Bilingual Pre-Trained Model

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm1. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:

  • Bilingual: supports both English and Chinese.
    Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
  • Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
  • Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
  • Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
  • Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.

Demo

1 Like

In practice, GLM-130B uses two different mask tokens ([MASK] and [gMASK]) for short and long text generation, respectively. Additionally, it adopts the newly proposed techniques for the Transformer architecture, including the Rotary positional encoding (RoPE)14, DeepNorm15 layer normalization, and Gaussian Error GLU (GeGLU)16 17. All these designs contribute to this large-scale language model of 70 layers, 12,288 hidden state dimension, 2,048 maximum sequence length, and a bilingual tokenizer with 150,000 tokens based on icetk.

They use two different mask tokens; one to fill in a small piece of text and another to generate more text. This lets you signal to the model to generate longer outputs, like full sentences, when you want that.

Examples

“LibreTranslate is a Free and Open Source Machine Translation API”

LibreTranslate is a Free and Open Source Machine Translation APIthat uses .NET and C#. It is currently under active development and is meant to help developers build more professional solutions. Translation memory of LibreTranslate is based on:- Microsoft Translator- Microsoft Cognitive Services- Google Translate- Microsoft Cognitive Services- Bing Translator- Translate.com- Translate.org- Translate.org and Google Translate are licensed under MIT license.- Microsoft Cognitive Services and Google Translate are licensed under Apache License, Version 2.0 (Apache License, Version 2.0).- Microsoft Cognitive Services and Bing Translator are licensed under Apache


"Once upon a time "

Once upon a time, in the mid-thirties, the Anaconda Copper Mining Company owned one hundred and fifty-five, or fifty-five, of the poorest looking competitors in Ohio–six death-traps in addition to ten adequately inhabited towns. The mines were situated in the most rugged sections of Ohio. Heavy labor in the fields, or an association among men that is unscrupulous enough to practice sexual morals out of the open, constitutes a death-trap; but a somewhat different figure in the field, or a capacity of improvidence that acquires no organization or direction and then goads its members into insanity


"Translate to French:

Hello : Bonjour
The dog : Le chien
The cheese : "

Translate to French:

Hello : Bonjour
The dog : Le chien
The cheese :
Le fromage The cat : La chatte Translate to German: Hello : Hallo The dog : Der Hund The cheese : Das Fromage The cat : Das Katzenfutter Translate to Spanish: Hello : Hola The dog : El perro The cheese : El queso The cat : La gata Translate to Italian: Hello : Ciao The dog : Il cane The cheese : Il formaggio The cat : La gatta Translate to Dutch: Hello : Hallo The dog : De hond The cheese : De ka

This demo is a raw language model without instruction fine-tuning (which is applied to FLAN-* series) and RLHF (which is applied to ChatGPT); its ability is roughly between OpenAI davinci and text-davinci-001. Thus, it is currently worse than ChatGPT and other instruction fine-tuned models :frowning:

I think you would need to fine-tune this model for it to be useful on any specific task.