Papers
arxiv:2606.01494

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

Published on May 31
· Submitted by
Vincent Koc
on Jun 3
Authors:
,
,
,
,

Abstract

Agent skills require layered security governance due to scanner disagreement, with findings showing varying detection rates across different scanner types and attack surfaces.

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill versions. Each row pairs redacted SKILL.md content and sanitized bundled files where present with a final ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Rather than estimating malicious-skill prevalence, we study scanner disagreement. The three scanners rarely flag the same skills: any pair overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. The disagreement is structured by attack surface. SkillSpector, which raises semantic agentic-risk advisories rather than malware-reputation signals, is positive for 19,209 of 25,504 suspicious rows (75.3%) but only 14 of 206 malicious rows (6.8%). The malicious-verdict region shows the inverse profile: 150 of 206 malicious rows (72.8%) are VirusTotal-positive, consistent with bundled-code malware evidence. These results show that agent-skill security requires layered governance, not single-scanner allow/block decisions. The corpus is released as a sanitized silver-standard dataset: labels are the registry's automated verdicts, not human-annotated ground truth, and the release represents an early, versioned snapshot intended to support the community while a human-annotated subset is developed. Further research is encouraged, including models tailored for skill-security triage.

Community

Paper author Paper submitter

Why Agent Skills Are a Different Security Problem

Most security tooling starts with a familiar question: does this artifact contain malware? That question matters for agent skills too, but it underdetermines the risk.

An agent skill can be a Markdown instruction file, a Python script, a workflow definition, references, or a bundle that combines all of these. When an agent loads a skill, it may gain new ways to invoke tools, access context, issue subtasks, install dependencies, or interact with external services.

That surface introduces failure modes that classic malware scanners are not designed to catch:

  • skills that request authority far beyond what their stated purpose requires,
  • instructions designed to steer or hijack an agent's behavior when processed,
  • code paths that can leak data passed through context,
  • workflows with dangerous side effects despite a benign description,
  • hardcoded credentials, insecure TLS settings, dynamic execution, or destructive shell patterns.

Some of these look like normal software-security findings. Others are specific to agentic systems, where a document can become operational instruction and a workflow can change what an autonomous assistant is allowed to do. ClawHub Security Signals is designed to expose that boundary.

What's in the Dataset

The dataset covers 67,453 latest public ClawHub skill versions across four deterministic splits: train (47,262), validation (10,076), test (6,747), and eval_holdout (3,368). The eval_holdout split is reserved for model evaluation and should not be used for training.

Each row includes redacted SKILL.md content, sanitized bundled files where present, the final ClawScan verdict, and summarized scanner evidence. During preparation, 387 secret-like values were redacted from exported bundle content. A TruffleHog verified-secret pass found 0 verified secrets after validation.

ClawScan assigns each skill version a registry verdict:

  • clean: 41,743 rows (61.9%)
  • suspicious: 25,504 rows (37.8%)
  • malicious: 206 rows (0.3%)

A suspicious verdict means the skill warrants review before trust is extended. It is not a confirmed-harmful label. A malicious verdict is still a silver-standard registry verdict, not human-verified ground truth at this stage.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.01494
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.01494 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.01494 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.