Noise injection
Add v′ = v + 𝒩(0, σ²I). Capacity scales with σ; so does distributional shift.
Production vector stores treat every embedding as an opaque numerical artifact. That assumption opens a quiet class of steganographic exfiltration attacks — and one of them, small-angle orthogonal rotation, defeats every off-the-shelf distribution-based detector we tested.
Retrieval-augmented generation converts sensitive content into high-dimensional embeddings and stores them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not appear to provide native or default controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation.
We show this opens a class of steganographic exfiltration attacks: an attacker with write access to the ingestion pipeline can hide payload data inside embeddings using simple post-embedding perturbations — noise, rotation, scaling, offset, fragmentation, and combinations thereof — while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users.
We evaluate these techniques across a synthetic-PII corpus on text-embedding-3-large, four locally hosted open embedding models, a cross-corpus replication on BEIR NFCorpus and a Quora subset (over 26,000 chunks combined), seven vector-store configurations, an adaptive-attacker variant, and a paraphrased-query retrieval benchmark.
Distribution-shifting perturbations are often caught by simple anomaly detectors; small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested. The constructive defense, VectorPin, cryptographically pins each embedding to its source content and producing model — any post-embedding modification breaks signature verification.
Five perturb a clean embedding in place; one swaps embedding models per fragment. Only the structural-exception family — orthogonal rotation — is invisible to the distribution-based detectors trained on a held-out clean baseline.
Add v′ = v + 𝒩(0, σ²I). Capacity scales with σ; so does distributional shift.
v′ = R·v for R ∈ O(d). Pairwise inner products and per-dim marginals are invariant.
v′ᵢ = sᵢ · vᵢ. Direction preserved, norms shift. Detectable via norm statistics.
v′ = v + o. The shared offset moves the empirical centroid by o — immediately obvious.
Split source across models {M₁…Mₖ}. No in-place perturbation; no statistical signal.
noise → rotation → scaling → offset. Detector AUC tracks the most-detectable component.
Below: a batch of 80 vectors (shown in 2D for clarity — the real attack lives in ℝ³⁰⁷² via composed Givens rotations). Pick a perturbation family and watch how a statistical detector reacts. The pairwise-similarity heatmap stays clean under rotation because R preserves every inner product ⟨u,v⟩. Under noise, offset, or scaling it lights up.
—
What you are seeing. Rotation is an orthogonal map: ⟨Ru, Rv⟩ = ⟨u, v⟩ for every pair. Per-dimension marginals are also preserved when R is sampled with full rotational symmetry. That is why the trained Isolation Forest + One-Class SVM stay at chance AUC — and why a statistical defense alone cannot close this gap. The paper's answer: cryptographic provenance, not anomaly detection.
Neither Isolation Forest nor One-Class SVM exceeds AUC 0.53 across the entire (θ, |Π|) sweep on small-angle orthogonal rotation.
The σ-noise channel saturates the detector immediately past a useful capacity threshold. The stealth window is narrow.
Closed-form per-vector capacity ceiling for the disjoint-Givens keyed-pair encoder. At d = 3072, that is 1,536 channel slots per embedding.
BEIR NFCorpus (medical) + a Quora subset. Rotation still indistinguishable; the result is not a quirk of one corpus.
Qdrant-default per-dimension int8 quantization preserved cosine to three decimal places across every technique tested.
No surveyed vector store ships native vector-content integrity, ingestion anomaly detection, or provenance attestation by default.
VectorPin is a minimal cryptographic provenance protocol: one signature, one hash family, a fixed canonical byte form for floating-point arrays. Reference implementations exist in Python and Rust, locked together by cross-language test vectors that guarantee bit-for-bit compatibility. Verification distinguishes signature forgery, vector tampering, source mismatch, and model substitution as distinct outcomes.
The vec_hash commits to the model's actual output bytes. Any post-embedding modification — every technique on this page — changes vec_hash and triggers VECTOR_TAMPERED on verification.
Limits, made explicit: VectorPin does not defend against an attacker who holds the private signing key, modifies source documents before embedding, or uses a legitimate signing key to attest a malicious vector at ingestion time. Key custody and upstream input validation remain the operator's job.