Revision Transformers: Instructing Language Models to Change their Values

Friedrich, Felix; Stammer, Wolfgang; Schramowski, Patrick; Kersting, Kristian

Computer Science > Computation and Language

arXiv:2210.10332 (cs)

[Submitted on 19 Oct 2022 (v1), last revised 25 Jul 2023 (this version, v3)]

Title:Revision Transformers: Instructing Language Models to Change their Values

Authors:Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting

View PDF

Abstract:Current transformer language models (LM) are large-scale models with billions of parameters. They have been shown to provide high performances on a variety of tasks but are also prone to shortcut learning and bias. Addressing such incorrect model behavior via parameter adjustments is very costly. This is particularly problematic for updating dynamic concepts, such as moral values, which vary culturally or interpersonally. In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction. We exemplify RiT on a moral dataset and simulate user feedback demonstrating strong performance in model revision even with small data. This way, users can easily design a model regarding their preferences, paving the way for more transparent AI models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2210.10332 [cs.CL]
	(or arXiv:2210.10332v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.10332

Submission history

From: Felix Friedrich [view email]
[v1] Wed, 19 Oct 2022 07:05:06 UTC (5,171 KB)
[v2] Fri, 21 Oct 2022 09:56:56 UTC (5,171 KB)
[v3] Tue, 25 Jul 2023 13:02:49 UTC (5,179 KB)

Computer Science > Computation and Language

Title:Revision Transformers: Instructing Language Models to Change their Values

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revision Transformers: Instructing Language Models to Change their Values

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators