BARRIER: Bounded Activation Regions for Robust Information Erasure

Jan Miksa @ GMUM JU
Patryk Krukowski @ GMUM JU

How BARRIER Works

Protection Loss

The protection loss has three components:

L_protect = L_mean + L_residual + L_interval

L_mean: Penalizes shift in mean activation (global drift)
L_residual: Penalizes changes in residual (non-principal) directions
L_interval: Penalizes activations moving outside the safe zone defined by percentiles

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         BARRIER                             │
├─────────────────────────────────────────────────────────────┤
│  1. setup_protection(model, forget_dl, forward_fn)          │
│     - Find target layers (e.g., "to_q", "to_k", "to_v")     │
│     - Collect activations via forward hooks                 │
│     - Compute SVD and define safe intervals                 │
│     - Snapshot initial parameters                           │
│                                                             │
│  2. compute_protection_loss(model, device)                  │
│     - Compare current params to snapshot                    │
│     - Compute drift in SVD space                            │
│     - Return weighted protection loss                       │
└─────────────────────────────────────────────────────────────┘

Parameters

Parameter	Default	Description
`targets`	`["fc"]`	Layer name patterns to protect (e.g., `["to_q", "to_k", "to_v"]`)
`lambda_interval`	`10.0`	Weight for protection loss
`lower_percentile`	`0.05`	Lower bound of safe activation zone
`upper_percentile`	`0.95`	Upper bound of safe activation zone
`reduced_dim`	`32`	SVD dimensions (for efficiency)
`infinity_scale`	`20.0`	Scale for outer bounds (negative space)
`use_actual_bounds`	`False`	Use actual min/max from remain data instead of scaled bounds
`normalize_protection`	`True`	Normalize loss by number of layers

Installation

Please follow instructions from each subfolder.

Pipelines (Recommended)

Each setting has a unified pipeline that orchestrates unlearning → evaluation → wandb logging via a single YAML config. All pipelines support wandb sweeps and are SLURM-ready.

Classification – Class-wise Forgetting

cd Classification
export PYTHONPATH="${PYTHONPATH}:$(cd .. && pwd)"
pip install -r requirements.txt

# Edit configs/pipeline_classwise.yaml (paths, wandb entity, etc.)
python pipeline.py --config configs/pipeline_classwise.yaml

Classification – Random Data Forgetting

cd Classification
python pipeline.py --config configs/pipeline_random.yaml

DDPM – Conditional Diffusion (CIFAR-10)

cd DDPM
export PYTHONPATH="${PYTHONPATH}:$(cd .. && pwd)"
pip install -r requirements.txt

# Edit configs/pipeline.yaml (paths, wandb entity, etc.)
python pipeline.py --config configs/pipeline.yaml

Stable Diffusion – Class Forgetting

cd SD
export PYTHONPATH="${PYTHONPATH}:$(cd .. && pwd)"
conda env create -f environment.yaml && conda activate ldm
# Download SD v1.4 weights (see SD/README.md)

python pipeline.py --config configs/pipeline_class.yaml

Stable Diffusion – NSFW Removal

cd SD
python pipeline.py --config configs/pipeline_nsfw.yaml

wandb Sweeps

Sweeps automate hyperparameter search across pipeline runs. Each sweep config defines which parameters to vary; the pipeline YAML provides defaults for everything else.

Quick start:

# Use the convenience script that creates sweep and starts agent
cd SD
./run_sweep.sh sweep_class

On SLURM – launch one agent per job:

#!/bin/bash
#SBATCH --gres=gpu:1 --mem=48G --time=48:00:00 --array=0-9
source activate ldm
cd /path/to/BARRIER/SD
export PYTHONPATH="${PYTHONPATH}:/path/to/BARRIER"
wandb agent <sweep-id>

Sweep parameter format – dotted keys map to nested YAML fields:

parameters:
  unlearn.lr:               # → cfg["unlearn"]["lr"]
    values: [1e-5, 5e-5]
  intact.lambda_interval:   # → cfg["intact"]["lambda_interval"]
    values: [1.0, 10.0]

Available sweep configs:

Setting	Config	Key parameters
Classification class-wise	`Classification/configs/sweep_classwise.yaml`	`unlearn_lr`, `unlearn_epochs`, `lambda_interval`, `base_method`
Classification random	`Classification/configs/sweep_random.yaml`	`unlearn_lr`, `unlearn_epochs`, `lambda_interval`, `base_method`
DDPM class forgetting	`DDPM/configs/sweep.yaml`	`lr`, `n_iters`, `lambda_interval`, `method`
SD class forgetting	`SD/configs/sweep_class.yaml`	`lr`, `epochs`, `lambda_interval`, `base_method`, `targets`
SD NSFW removal	`SD/configs/sweep_nsfw.yaml`	`lr`, `epochs`, `lambda_interval`, `targets`

Adding parameters: copy any dotted key from the pipeline YAML into the sweep's parameters: block. Use values: for grid, min:/max: for random, or distribution: for Bayesian. See the wandb sweep docs for details.

Examples:

# Classification class-wise sweep
cd Classification
./run_sweep.sh sweep_classwise

# Classification random sweep
./run_sweep.sh sweep_random

# DDPM sweep
cd DDPM
./run_sweep.sh sweep

# SD class sweep
cd SD
./run_sweep.sh sweep_class

# SD NSFW sweep
./run_sweep.sh sweep_nsfw

SLURM

Wrap any pipeline command in a SLURM script (eg. DDPM):

#!/bin/bash
#SBATCH --job-name=barrier-ddpm
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --output=logs/%j.out

source activate ldm  # or your env
cd /path/to/BARRIER/DDPM
export PYTHONPATH="${PYTHONPATH}:/path/to/BARRIER"
python pipeline.py --config configs/pipeline.yaml

For sweep agents on SLURM, replace the last line with wandb agent <sweep-id>.

Direct Script Usage

The original training and evaluation scripts remain available for finer-grained control.

Classification (manual)

cd Classification
export PYTHONPATH="${PYTHONPATH}:/path/to/BARRIER"

# Forget class 0 (airplane)
python intact_experiment.py --unlearn_classes 0 --lambda_interval 100.0

DDPM (manual)

cd DDPM
python train.py --config configs/cifar10_intact.yml

Stable Diffusion (manual)

cd SD

# GA + BARRIER
python train-scripts/intact_unlearn.py \
    --base_method ga --class_to_forget 0 \
    --targets to_q to_k to_v --lambda_interval 1.0 --epochs 5

# Generate + evaluate
python eval-scripts/generate-images.py --model_name "..." --prompts_path prompts/imagenette.csv --save_path evaluation/
python eval-scripts/compute-fid.py --folder_path evaluation/
python eval-scripts/imageclassify.py --prompts_path prompts/imagenette.csv --folder_path evaluation/

Parameters

Parameter	Default	Description
`targets`	`["fc"]`	Layer name patterns to protect (e.g., `["to_q", "to_k", "to_v"]`)
`lambda_interval`	`10.0`	Weight for protection loss
`lower_percentile`	`0.05`	Lower bound of safe activation zone
`upper_percentile`	`0.95`	Upper bound of safe activation zone
`reduced_dim`	`32`	SVD dimensions (for efficiency)
`infinity_scale`	`20.0`	Scale for outer bounds (negative space)
`use_actual_bounds`	`False`	Use actual min/max from remain data instead of scaled bounds
`normalize_protection`	`True`	Normalize loss by number of layers

Project Structure

BARRIER/
├── InTAct/
│   └── intact.py                          # Core BARRIER implementation (based on InTAct)
├── Classification/
│   ├── pipeline.py                        # Unified pipeline (classwise + random)
│   ├── configs/
│   │   ├── pipeline_classwise.yaml
│   │   ├── pipeline_random.yaml
│   │   ├── sweep_classwise.yaml
│   │   └── sweep_random.yaml
│   ├── intact_experiment.py               # Standalone InTAct demo
│   ├── main_forget.py                     # Baseline unlearning methods
│   └── ...
├── DDPM/
│   ├── pipeline.py                        # Unified pipeline
│   ├── configs/
│   │   ├── pipeline.yaml
│   │   └── sweep.yaml
│   ├── train.py                           # Original training entry
│   ├── runners/diffusion.py               # InTAct integration
│   └── ...
├── SD/
│   ├── pipeline.py                        # Unified pipeline (class + NSFW)
│   ├── configs/
│   │   ├── pipeline_class.yaml
│   │   ├── pipeline_nsfw.yaml
│   │   ├── sweep_class.yaml
│   │   └── sweep_nsfw.yaml
│   ├── train-scripts/intact_unlearn.py    # SD InTAct (GA, RL, NSFW, ESD)
│   ├── eval-scripts/                      # FID, classify, NudeNet
│   └── ...
└── README.md

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
Classification		Classification
DDPM		DDPM
Flux		Flux
InTAct		InTAct
SD		SD
.gitattributes		.gitattributes
.gitignore		.gitignore
DDPM.pth		DDPM.pth
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
setup_cache.py		setup_cache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BARRIER: Bounded Activation Regions for Robust Information Erasure

How BARRIER Works

Protection Loss

Architecture

Parameters

Installation

Pipelines (Recommended)

Classification – Class-wise Forgetting

Classification – Random Data Forgetting

DDPM – Conditional Diffusion (CIFAR-10)

Stable Diffusion – Class Forgetting

Stable Diffusion – NSFW Removal

wandb Sweeps

SLURM

Direct Script Usage

Parameters

Project Structure

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BARRIER: Bounded Activation Regions for Robust Information Erasure

How BARRIER Works

Protection Loss

Architecture

Parameters

Installation

Pipelines (Recommended)

Classification – Class-wise Forgetting

Classification – Random Data Forgetting

DDPM – Conditional Diffusion (CIFAR-10)

Stable Diffusion – Class Forgetting

Stable Diffusion – NSFW Removal

wandb Sweeps

SLURM

Direct Script Usage

Parameters

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages