网安资讯详情 - SecLens 情报雷达

网安资讯,一网打尽。汇集权威漏洞通告与行业要闻,结合分组浏览、智能过滤、RSS订阅 和 Webhook 推送,多通道拓展您的安全情报视野。

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

来源: arxiv_cs_cr · 发布时间 2026-06-01 22:23 (UTC+08:00) · 抓取时间 2026-06-02 19:10 (UTC+08:00)

原文链接

摘要

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe behavior. We introduce SeClaw, a framework that combines specification-driven security task synthesis with execution-based security evaluation for Autonomous agents. Spec-driven security task synthesis enables scalable and controllable construction of security tasks from structured risk specifications, while SeClaw docker provides a standardized testbed for evaluating agent behavior under diverse safety-risk scenarios. The benchmark covers risks arising from resources, user tasks, environments, and intrinsic agent behaviors, and supports trajectory-aware assessment of unsafe actions beyond final responses. By bridging systematic task synthesis and reproducible security evaluation, SeClaw provides a practical foundation for measuring, diagnosing, and comparing security failures in autonomous LLM agents. The code is available at https://github.com/seclaw-eval/seclaw-eval.

正文

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe behavior. We introduce SeClaw, a framework that combines specification-driven security task synthesis with execution-based security evaluation for Autonomous agents. Spec-driven security task synthesis enables scalable and controllable construction of security tasks from structured risk specifications, while SeClaw docker provides a standardized testbed for evaluating agent behavior under diverse safety-risk scenarios. The benchmark covers risks arising from resources, user tasks, environments, and intrinsic agent behaviors, and supports trajectory-aware assessment of unsafe actions beyond final responses. By bridging systematic task synthesis and reproducible security evaluation, SeClaw provides a practical foundation for measuring, diagnosing, and comparing security failures in autonomous LLM agents. The code is available at https://github.com/seclaw-eval/seclaw-eval. Authors: Hao Cheng, Changtao Miao, Tianle Song, Yin Wu, He Liu, Erjia Xiao, Junchi Chen, Xiaoyu Shi, Yichi Wang, Jing Yang, Taowen Wang, Jinhao Duan, Mengshu Sun, Peiyan Dong, Xuan Shen, Yang Cao, Renjing Xu, Kaidi Xu, Jindong Gu, Bo Zhang, Jize Zhang, Chenhao Lin, Philip Torr, Chao Shen Categories: cs.CR, cs.AI PDF: https://arxiv.org/pdf/2606.02302v1

标签

扩展字段

{
  "arxiv_id": "2606.02302v1",
  "authors": [
    "Hao Cheng",
    "Changtao Miao",
    "Tianle Song",
    "Yin Wu",
    "He Liu",
    "Erjia Xiao",
    "Junchi Chen",
    "Xiaoyu Shi",
    "Yichi Wang",
    "Jing Yang",
    "Taowen Wang",
    "Jinhao Duan",
    "Mengshu Sun",
    "Peiyan Dong",
    "Xuan Shen",
    "Yang Cao",
    "Renjing Xu",
    "Kaidi Xu",
    "Jindong Gu",
    "Bo Zhang",
    "Jize Zhang",
    "Chenhao Lin",
    "Philip Torr",
    "Chao Shen"
  ],
  "categories": [
    "cs.CR",
    "cs.AI"
  ],
  "comment": null,
  "doi": null,
  "entry_id": "https://arxiv.org/abs/2606.02302v1",
  "pdf_url": "https://arxiv.org/pdf/2606.02302v1",
  "primary_category": "cs.CR",
  "search_query": "cat:cs.CR",
  "updated_at": "2026-06-01T14:23:42+00:00"
}