SteeringVector¶
- class steering_vectors.SteeringVector(layer_activations, layer_type='decoder_block')[source]¶
A steering vector that can be applied to a model.
- apply(model, layer_config=None, operator=None, multiplier=1.0, min_token_index=0, token_indices=None)[source]¶
Apply this steering vector to the given model. Tokens to patch can be selected using either min_token_index or token_indices, but not both. If neither is provided, all tokens will be patched.
- Return type:
Generator
[None
,None
,None
]- Parameters:
model – The model to patch
layer_config – A dictionary mapping layer types to layer matching functions. If not provided, this will be inferred automatically.
operator – A function that takes the original activation and the steering vector and returns a modified vector that is added to the original activation.
multiplier – A multiplier to scale the patch activations. Default is 1.0.
min_token_index – The minimum token index to apply the patch to. Default is None.
token_indices – Either a list of token indices to apply the patch to, a slice, or a mask tensor. Default is None.
Example
>>> model = AutoModelForCausalLM.from_pretrained("gpt2-xl") >>> steering_vector = SteeringVector(...) >>> with steering_vector.apply(model): >>> model.forward(...)
- layer_activations¶
- layer_type = 'decoder_block'¶
- patch_activations(model, layer_config=None, operator=None, multiplier=1.0, min_token_index=None, token_indices=None)[source]¶
Patch the activations of the given model with this steering vector. This will modify the model in-place, and return a handle that can be used to undo the patching. This method does the same thing as apply, but requires manually undoing the patching to restore the model to its original state. For most cases, apply is easier to use. Tokens to patch can be selected using either min_token_index or token_indices, but not both. If neither is provided, all tokens will be patched.
- Return type:
- Parameters:
model – The model to patch
layer_config – A dictionary mapping layer types to layer matching functions. If not provided, this will be inferred automatically.
operator – A function that takes the original activation and the steering vector and returns a modified vector that is added to the original activation.
multiplier – A multiplier to scale the patch activations. Default is 1.0.
min_token_index – The minimum token index to apply the patch to. Default is None.
token_indices – Either a list of token indices to apply the patch to, a slice, or a mask tensor. Default is None.
Example
>>> model = AutoModelForCausalLM.from_pretrained("gpt2-xl") >>> steering_vector = SteeringVector(...) >>> handle = steering_vector.patch_activations(model) >>> model.forward(...) >>> handle.remove()