In natural language processing (NLP) applications, long prompts pose significant challenges, including slower inference speed, higher computational costs, and a diminished user experience. Furthermore, the limitations imposed by context length restrict model performance and application scope, creating a strong need to reduce prompt length.

In a new paper 500xCompressor: Generalized Prompt Compression for Large Language Models, a Cambridge University research team proposes the 500xCompressor, a method designed to condense extensive natural language contexts into a minimum of just one special token, achieving compression ratios ranging from 6x to 480x.

None

The 500xCompressor not only preserves the advantages of previous methods but also adds new features. Like earlier soft prompt techniques, 500xCompressor is generalized and non-selective, capable of compressing unseen texts across various topics for tasks like question answering (QA), showcasing its versatility.

Unlike selective compression methods, 500xCompressor is designed to regenerate the entire original text, ensuring that all tokens from the original are represented in the compressed version. Additionally, these compressed prompts can be used to regenerate original texts or perform QA without the need for fine-tuning the large language model (LLM), thereby maintaining the LLM's original capabilities and enhancing the convenience of using compressed tokens.

None

The researchers make significant contributions in three key areas:

  • High Compression Ratio: The study assesses the compression model using one, four, and sixteen tokens to compress up to 500 tokens, achieving ratios from 6x to 480x. These ratios far exceed those reported in previous studies, which achieved compression ratios of less than 50x, thus fully exploring the upper limits of prompt compression.
  • Strict Unseen Evaluation Set: The evaluation texts, drawn from the Arxiv Corpus and ArxivQA dataset, were published after January 2024, representing new, domain-specific content that was not used in training the original LLM or the compression model.
  • Quantitative Analysis of Information Loss: The compressed texts are evaluated in an extractive QA setup, where the answer is a specific span within the context, allowing for a precise comparison of 500xCompressor with baseline methods and gold standards. This approach provides a detailed analysis of any information loss during prompt compression.
None

Experimental results show that 500xCompressor achieves a high compression ratio while retaining most of the functionalities of non-compressed prompts. This finding demonstrates the significant potential for compressing current prompts, encouraging further research into compression techniques and their applications.

The paper 500xCompressor: Generalized Prompt Compression for Large Language Models is on arXiv.

Author: Hecate He | Editor: Chain Zhang