Two posts ago I wrote that I’d moved authorized security work to GLM-5.2 because the frontier kept refusing it. Then the independent receipts came in and the model held up. The fair objection landed almost immediately: “great, so you just shipped your client’s recon to a Chinese company?”
It’s a real concern. It’s also a routing choice, not a property of the model. Here is how I actually run an open model for scoped engagements, so the answer to that question is no.
It’s the host, not the weights
The most useful thing I learned digging into this: the refusals and the data risk are two different layers, and people conflate them.
The GLM-5.2 weights carry very little security-topic refusal behaviour. What baked-in sensitivity exists is mostly Chinese political content, and it’s gated on language and persona rather than hard-blocked. On exploit analysis, vulnerability work, and pentest reasoning, the model engages. Semgrep ran it on real vulnerability detection with a plain harness and no prompt gymnastics, and it beat Claude Code outright. No refusal story there at all.
The blocking people hit is a moderation layer the hosting API adds on top of the weights. The cleanest evidence: one test scored GLM-4.7 Flash at 95.2% on a sensitive-question set run locally, but only 79.8% through an aggregator’s default routing - same weights, fifteen points gone, and 76% of the failures came back as blank responses rather than the model declining. That’s server-side interception, not the model. Swap the host and keep the weights, and the behaviour changes - which is also why an aggregator’s default route isn’t good enough on its own.
So the highest-impact lever isn’t prompt engineering. It’s where you run it.
Beijing is a routing choice, not a feature
Z.ai’s own API and SiliconFlow are both Beijing-headquartered, which means they fall under China’s National Intelligence Law.
— China's National Intelligence Law, Article 7Any organization or citizen shall support, assist, and cooperate with national intelligence efforts.
No exception, no disclosure requirement, no right to refuse on the record. For a security engagement, that’s disqualifying. Target IP ranges, scan output, discovered vulnerabilities, client PII - anything you send through those endpoints should be treated as potentially accessible to a foreign state. That’s incompatible with essentially every pentest engagement agreement I’ve signed, and a non-starter for anything touching government, defence, healthcare, or critical infrastructure. And this isn’t me overreading a statute: the US Department of Homeland Security’s own data-security advisory states plainly that PRC firms are “required to secretly share data with the PRC government… even if that request is illegal under the jurisdiction in which these firms operate.”
But the weights are MIT-licensed. The model has no idea who’s hosting it. Beijing is in the path only if you put it there, and the default z-ai/glm-5.2 route on an aggregator can quietly do exactly that. The fix is to choose a Western host serving the raw weights.
The setup I’d actually run
Ranked, for a professional handling client data:
- Best self-serve: Fireworks direct. Day-zero GLM-5.2, FP8, zero data retention by default (prompts live in memory, never logged to disk), US infrastructure, ISO 27001 / SOC 2 / HIPAA, OpenAI-compatible. No moderation layer on open models. About $1.40 / $4.40 per million tokens in/out. One gotcha: use standard chat completions, not the stored-response API.
- Best for named-client or regulated work: Baseten dedicated. Single-tenant cluster, nobody else’s traffic, no shared moderation, fastest serving I found (about 278 tokens/sec on Blackwell), US, SOC 2 / HIPAA. Custom pricing, sales contact.
- Most flexible: OpenRouter, hardened. It’s what I use, and the trick is to stop it ever routing to Beijing. Pin Western providers and refuse data collection on every request:
"provider": {
"ignore": ["z-ai", "siliconflow"],
"only": ["fireworks", "deepinfra", "together"],
"data_collection": "deny"
}
And never enable the prompt-logging discount - opting in grants an irrevocable commercial-use right over your inputs and outputs. Default no-logging plus zero-data-retention is the correct posture.
- Budget: DeepInfra. About $1.20 / $4.20 per million on FP4, memory-only retention, US data centres. The FP4 quantization costs you some precision on long multi-step analysis, so weigh it against the work.
- Highest assurance: self-host on rented GPUs. RunPod or Lambda, 4x H200 or 5x A100, vLLM with INT4 weights, in a region you choose. Roughly $6.50 to $20 an hour. No provider sees anything. The only route that satisfies an air-gap clause or a defence-sector NDA - and the literal version of “nobody can ban it, and nobody can read it.”
None of this is a license to point a model at something you don’t have written permission to test. The setup here is for scoped, signed-off engagements - the case where the model should help and the frontier wrongly refuses. The data-control measures protect your client; the authorization protects everyone. Get the engagement letter first. The tooling is the easy part.
Reduce refusals honestly
Here’s the counterintuitive part, and it’s the whole reason the open model matters. On a safety-tuned frontier model, telling it you’re authorized can make things worse, not better:
— Defensive Refusal Bias, Scale AI security research, 2026Explicit authorization, where the user directly instructs the model that they have authority to complete the target task, increases refusal rates, suggesting models interpret justifications as adversarial rather than exculpatory.
The same study clocked frontier models refusing defensive security tasks at 2.72 times the rate of neutral ones, with pure blue-team work like system hardening refused 43.8% of the time. You can’t reason your way past that, because on those models the reasoning is the trigger. Red Hat’s red team hit the wall head-on: the frontier refused 100% of their authorized infrastructure-attack prompts, and they had to reach for an open model to run the test at all.
A model served from raw open weights doesn’t carry that reflex, so the technique that works is the honest one:
- Pick a raw-weights host, then state the context. Engagement authorization, scope, defensive purpose, at the system level. On a model that isn’t second-guessing your motives, that just orients the work - accurate professional framing, not a jailbreak. It holds up empirically: TrustedSec benchmarked six self-hosted models against a deliberately vulnerable target across 4,800 runs and saw pass rates north of 85%, landing on the same conclusion Semgrep did - the harness matters more than the model.
- Skip the abliterated fine-tunes. Red Hat needed an abliterated model for full offensive infra testing, but for most scoped work you won’t go that far, and you shouldn’t want to. Ablation strips the refusal direction out of the weights at a measured cost in reasoning and added hallucination. For exploit code and analysis, where precision is the whole point, that trade is backwards. You want the model sharp, not lobotomized into compliance.
What this isn’t
- GLM-5.2 is not the frontier. It trails Opus 4.8 on the hardest reasoning, and the security win was one benchmark on one dataset. It’s a capable workhorse, not a miracle, and the harness around it does more work than the model swap.
- “No documented moderation layer” is not a guarantee of none. Western hosts don’t advertise an added filter on open models, and behaviour matches that, but absence of a published policy isn’t a signed contract. Verify against your own requirements.
- Self-hosting isn’t free. A 744B-parameter mixture-of-experts model is real hardware and real ops. The unrevocable, unreadable property is what you’re paying for.
The open model gave defenders a tool the frontier decided we couldn’t be trusted with. Running it properly means you don’t trade one bad deal for another - you keep the capability, lose the refusals, and your client’s data never leaves the building, let alone the country.


