It's Just Vectors
Context Beats Magnitude: Why Embedding Sentences Outperforms Raw Numbers in Anomaly Detection
· 7 min read
54.20. That’s what the transaction amount field contains. Embed that string and the model learns something about the number 54.20: that it sits near 54.19 and 54.21, that it’s a plausible price for groceries or a restaurant meal. It doesn’t learn anything about where the money went or when. This post uses a deliberately reductive fraud-shaped example to make a narrower point: formatting structured fields as a sentence gives the embedder more context than a raw scalar ever can.
Part 3 showed you the geometry of your embeddings: the clusters, the outliers, the distance between categories. Part 4 changes the domain and sharpens the question. Instead of classifying transactions into categories, we’re asking: does this transaction look like the known-normal examples, or does it sit far away from them? The embed detect command answers that by comparing each transaction’s embedding against a centroid of known-normal transactions. This is not a production fraud detector, rather, it’s a small, controlled example for showing that embedding quality depends heavily on how you represent the input.
What you’re building #
embed detect takes a set of labeled transactions and a threshold, embeds each one, and compares every embedding against the centroid of transactions labeled Normal. Anything below the similarity threshold gets flagged.
The --raw flag controls what gets embedded. Without it, each transaction becomes a descriptive sentence before embedding. With it, only the dollar amount is embedded. The gap in detection quality between those two modes is what this post is about.
./embed detect --provider openai --model text-embedding-3-small
./embed detect --provider openai --model text-embedding-3-small --raw
Synthetic string mode (ollama / embeddinggemma):
Normal centroid built from 4 transactions. Threshold: 0.80
Label Category Similarity
----------------------------------------------
Normal_1 (Normal ) sim: 0.8420 ✓ normal
Normal_2 (Normal ) sim: 0.8505 ✓ normal
Normal_3 (Normal ) sim: 0.8626 ✓ normal
Normal_4 (Normal ) sim: 0.8307 ✓ normal
Anomaly (Fraud_Candidate ) sim: 0.6593 ✗ FLAGGED
Raw amount mode (scores vary by model; these are representative):
Normal_1 (Normal ) sim: 0.8801 ✓ normal
Normal_2 (Normal ) sim: 0.8743 ✓ normal
Normal_3 (Normal ) sim: 0.8812 ✓ normal
Normal_4 (Normal ) sim: 0.8798 ✓ normal
Anomaly (Fraud_Candidate ) sim: 0.8634 ✓ normal ← not flagged
The anomaly ($4,999 at a high-end jewelry store at 3:45 AM) gets flagged in synthetic mode and missed in raw mode. On this toy dataset, the similarity gap that makes separation possible (0.84 vs 0.66) collapses to noise when you strip the transaction down to its amount.
Why raw embedding fails #
The model has seen 54.20 in training. It likely encodes it as a price-like token and places it near nearby prices. What it doesn’t have is the frame: what kind of transaction is 54.20? Grocery? Gas? A jewelry store at 3 AM?
4999.00 is more unusual as a number, but not dramatically so. It’s a price for a laptop, a flight, furniture. In raw form, the jump from 54.20 to 4999.00 is mostly a magnitude difference. Embeddings are not a reliable numeric reasoning system, and price alone is ambiguous across domains.
The synthetic string puts the amount back in context:
"Transaction: 45.00 USD at Whole Foods Market at 11:00 AM"
"Transaction: 4999.00 USD at High-End Jewelry Store at 03:45 AM"
Now the embedder has more contextual signals available: merchant, amount, and time appear together in one consistent string. That gives it a better chance of placing the obvious outlier away from the normal examples. The distance between these two strings is no longer driven only by numeric magnitude.
FormatTransaction
#
The function that does the framing:
type Transaction struct {
Label string `json:"label"`
Category string `json:"category"`
Amount float64 `json:"amount"`
Merchant string `json:"merchant"`
Time string `json:"time"`
}
func FormatTransaction(t Transaction) string {
return fmt.Sprintf("Transaction: %.2f USD at %s at %s",
t.Amount, t.Merchant, t.Time)
}
Field order matters, but not because there’s a simple rule that earlier tokens always dominate later ones. The safer claim is that formatting affects the embedding, so you should keep the format consistent and lead with the fields carrying the most useful signal. Here, amount and merchant do most of the descriptive work; time modifies the interpretation.
The raw alternative, for comparison:
func FormatRaw(t Transaction) string {
return fmt.Sprintf("%.2f", t.Amount)
}
Both functions return a string that gets passed to the embedder. The embed detect command calls one or the other depending on whether --raw is set. The detection logic after that is identical.
Building the Normal centroid #
The centroid calculation is the same pattern as Part 2, applied to a new label. Group embeddings by category, compute the centroid for Normal, and use it as the reference point for everything else.
var normalVecs [][]float32
for i, item := range inputs {
if item.Category == "Normal" {
normalVecs = append(normalVecs, embeddings[i])
}
}
centroid, err := math.CalculateCentroid(normalVecs)
if err != nil {
return fmt.Errorf("centroid: %w", err)
}
Then compare every transaction (including the normals) against it:
const threshold = float32(0.80)
for i, item := range inputs {
sim, err := math.CosineSimilarity(embeddings[i], centroid)
if err != nil {
return err
}
flag := "✓ normal"
if sim < threshold {
flag = "✗ FLAGGED"
}
fmt.Printf("%-14s (%-16s) sim: %.4f %s\n",
item.Label, item.Category, sim, flag)
}
On this toy dataset, 0.80 is a useful cutoff because the outlier is extreme and the examples are well separated. You should not expect that number to transfer across datasets, string formats, or embedding models. Part 5 replaces it with z-scores, which adapt to the actual spread of your data.
The inputs #
inputs.json for this part uses the fraud detection domain. Categories are Normal (training data for the centroid), Fraud_Candidate (the anomalies we want to catch), and optionally edge cases:
[
{ "label": "Normal_1", "category": "Normal", "amount": 45.00, "merchant": "Whole Foods Market", "time": "11:00 AM" },
{ "label": "Normal_2", "category": "Normal", "amount": 32.50, "merchant": "Shell Gas Station", "time": "08:15 AM" },
{ "label": "Normal_3", "category": "Normal", "amount": 8.75, "merchant": "Starbucks", "time": "07:30 AM" },
{ "label": "Normal_4", "category": "Normal", "amount": 120.00, "merchant": "Target", "time": "02:15 PM" },
{ "label": "Anomaly", "category": "Fraud_Candidate", "amount": 4999.00, "merchant": "High-End Jewelry Store", "time": "03:45 AM" }
]
The normal transactions span a realistic range of amounts and merchants. Target at $120 is deliberately included: it’s a larger amount but a plausible merchant at a plausible time. The anomaly is unusual on multiple dimensions simultaneously: amount, merchant type, and time of day. Embedding it as a sentence means the model gets all three signals to work with.
Testing the math first #
Before touching the API, verify against hand-crafted vectors. The unit test in math/math_test.go checks FormatTransaction output and the centroid + threshold logic against vectors you control:
normalA := []float32{0.9, 0.1, 0.8, 0.2}
normalB := []float32{0.85, 0.15, 0.75, 0.25}
outlier := []float32{0.1, 0.9, 0.2, 0.8}
centroid, _ := math.CalculateCentroid([][]float32{normalA, normalB})
simA, _ := math.CosineSimilarity(normalA, centroid) // → ~0.9997
simB, _ := math.CosineSimilarity(normalB, centroid) // → ~0.9997
simO, _ := math.CosineSimilarity(outlier, centroid) // → ~0.2066
The outlier should score near 0.2 against a centroid computed entirely from vectors pointing in the opposite direction. If it does, the detection pipeline is correct. The API-level check is narrower: on this small fixture set, the formatted strings should separate more cleanly than the raw amounts.
Getting started #
cd tutorial/part4/
complete/ has the full implementation. start/ has FormatTransaction and FormatRaw stubbed out; the command wiring, struct definitions, and file I/O are intact.
# OpenAI
export OPENAI_API_KEY="sk-your-key-here"
cd start/
go mod tidy
go build -o embed .
./embed detect --provider openai --model text-embedding-3-small
# or Ollama (no API key required)
./embed detect --provider ollama --model embeddinggemma
Run the synthetic version first. Then run --raw and look at the Anomaly row. The number that changes (from FLAGGED to ✓ normal) is the cost of losing context.
What’s next #
0.80 works here because the example is deliberately extreme. Real data has transactions that are unusual on one dimension but not another — a $4,999 jewelry purchase at noon is less suspicious than the same amount at 3:45 AM. Part 5 replaces the fixed threshold with z-scores, so the cutoff adapts to the actual spread of your data instead of a number you picked by hand.
The tutorial repository and all code: rikdc/semantic-search-experiments