|
|
|
| Intelligent Bidirectional Privacy Anonymization Strategy for Peer Review in STM Journals |
| ZUO Shuangyan1,ZHANG Xin2,CHEN Liwen3,GAO Wuqiang4,5,* |
1. Editorial Office of Chinese Journal of Infection Control, Xiangya Medical Academic Promotion Center, Xiangya Hospital Central South University, 410008, Changsha, China 2. Society of China University Journals, 100083, Beijing, China 3. Editorial Office of the Journal of Central South University (Medical Sciences), Central South University Press, 410078, Changsha, China 4. Center for Healthcare-associated Infection Control, Xiangya Hospital, Central South University, 410008, Changsha, China 5. Information Center, Xiangya Hospital, Central South University, 410008, Changsha, China |
|
|
|
|
Abstract In an era where journals are under mounting pressure to implement double-blind peer review while handling rapidly increasing submission volumes, editorial offices still depend largely on manual redaction or coarse “document inspector” tools to remove identifying details from manuscripts and reviewer reports. These practices are labor intensive, difficult to standardize across editors, and often act as blunt instruments that disrupt the review process by removing useful layout metadata together with sensitive information. They also provide limited protection against implicit semantic leakage, To mitigate these risks and reconcile the tension between robust privacy protection and editorial efficiency, this study proposes an intelligent bidirectional privacy anonymization strategy that integrates rule-based algorithms with large language models (LLMs) and implements it as a scalable browser/server application aligned with editorial workflows. Grounded in an analysis of typical submission materials, the system formalizes three design dimensions: supported file formats, sensitive information categories and high-risk document locations. It supports mainstream word-processing formats, targets core identifiers for authors and reviewers, and concentrates on predefined high-risk locations. On this foundation, we construct a two-layer hybrid engine. A rule-based layer, implemented against the Office Open XML schema, uses regular expressions and structural cues to deterministically locate and neutralize well-structured fields such as author lists, affiliations and email addresses while explicitly protecting in-text citations and reference lists as spans that must not be altered. An LLM-based layer is then invoked through structured prompts that encode editorial heuristics and send only minimal, context-tagged text segments to the model. This layer identifies and masks residual identity cues that escape rule-based detection—the "long tail" of semantic leakage. For PDF files, whose internal structure is less amenable to safe in-place editing, the system adopts a non-destructive “sensitive-information warning” mode in which extracted text is screened and suspected identifiers are flagged for manual verification rather than being automatically rewritten. The hybrid approach is extended symmetrically to reviewer reports. For DOC/DOCX files, the system parses comment and revision nodes and replaces user names and contact details with neutral labels such as "journal editor" or "reviewer A" while preserving the original review content; for PDF reports, suspected identity fields are similarly highlighted for anonymization or human confirmation. The anonymization engine is exposed through a web interface and standardized application programming interfaces, enabling on-demand use by editors and integration with editorial management systems at key workflow stages. An internal evaluation of real manuscripts and reviewer reports by experienced editors indicates that the rule-plus-LLM strategy more reliably removes explicit identifiers and reduces implicit identity cues in high-risk locations than manual or rule-only approaches, without altering in-text citations or reference lists, and substantially shortens the preparation time for double-blind review. Comparison of manual, rule-only, LLM-only and hybrid schemes suggests that the proposed engine achieves a favourable balance of precision, coverage, consistency and operational cost. Overall, this study demonstrates the feasibility of deploying a rules-plus-LLM hybrid engine for intelligent bidirectional anonymization in journal peer review and offers a practical, scalable pathway for journals seeking to strengthen privacy protection and editorial efficiency, while building a fairer and more trustworthy peer-review ecosystem.
|
|
Published: 19 March 2026
|
|
Corresponding Authors:
Wuqiang GAO
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 脱敏方案 | 脱敏准确性 | 覆盖范围 | 处理效率 | 一致性 | 风险隐患 | 实现成本 | | 人工脱敏 | 依赖编辑经验,易有遗漏 | 取决于人工注意力,难涵盖所有细节 | 逐稿人工操作,耗时长,难批量 | 不同编辑标准不一,结果有差异 | 人为疏忽漏删或误删风险高 | 人力投入高 | | 规则脱敏 | 精确匹配预设模式,高精度,但可能漏掉非常规情况 | 局限于规则库,未知格式难处理 | 机器自动执行,秒级处理单稿件 | 结果稳定一致,但缺乏灵活性 | 规则缺陷致漏脱/误替换,难应对新风险 | 需编写维护规则,开发成本中等 | | LLM脱敏 | 具备深度语义理解能力,可识别隐性线索,但存在模型幻觉,准确性不可控 | 语种适应性强,但受上下文窗口限制,长文档处理困难,且难以直接解析文档底层结构 | 受网络时延与推理算力限制,全量文本扫描耗时,大批量稿件处理效率低 | 生成式机制具有随机性,同一文本多次处理可能输出不同结果,缺乏标准化质量控制 | 易过度脱敏或误删漏删,依赖外部接口时存在数据外泄与合规风险 | 人力投入较少,但有模型调用成本,私有化部署则面临算力与运维门槛 | | 规则+LLM混合脱敏 | 语义理解全面,识别隐蔽信息,准确率最高,但对大模型能力有一定要求 | 覆盖范围最广,可区分正文中的文献引用等无须处理的情况 | 自动批量处理,整体高效,处理全文耗时处于规则和LLM之间 | 基于统一规则和模型,输出更加稳定可控 | 遗漏或误处理风险最低,需注意AIGC伦理及法规要求 | 模块较复杂,调用LLM有一定经济成本,但可大幅降低人力成本 |
|
|
|
| 1 |
International Committee of Medical Journal Editors. Recommendations for the conduct,reporting,editing,and publication of scholarly work in medical journals[EB/OL]. [2025-09-12]. https://www.icmje.org/recommendations.
|
| 2 |
罗燕,叶赋桂. 同行评议:科学的守门人[EB/OL].(2021-01-12)[2025-09-12]. https://news.gmw.cn/2021-01/12/content_34534901.htm.
|
| 3 |
盛怡瑾, 初景利. 同行评议质量控制方法研究进展[J]. 出版科学, 2018, 26 (5): 46- 53.
|
| 4 |
Huber J , Inoua S , Kerschbamer R , et al. Nobel and novice:Author prominence affects peer review[J]. Proc Natl Acad Sci U S A, 2022, 119 (41): e2205779119.
|
| 5 |
Aczel B , Barwich A S , Diekman A B , et al. The present and future of peer review:Ideas,interventions,and evidence[J]. Proc Natl Acad Sci U S A, 2025, 122 (5): e2401232121.
|
| 6 |
Helmer M , Schottdorf M , Neef A , et al. Gender bias in scholarly peer review[J]. eLife, 2017, 6, e21718.
|
| 7 |
Kern-Goldberger A R , James R , Berghella V , et al. The impact of double-blind peer review on gender bias in scientific publishing:a systematic review[J]. American journal of obstetrics and gynecology, 2022, 227 (1): 43- 50:e4.
|
| 8 |
李艳红, 邓履翔. 科技期刊实施开放同行评议面临的挑战与应对策略[J]. 科技与出版, 2025 (6): 80- 87.
|
| 9 |
王琳. 科技期刊同行评议内容公开的现状调研及策略建议[J]. 中国科技期刊研究, 2022, 33 (6): 776- 783.
|
| 10 |
Tomkins A , Zhang M , Heavlin W D . Reviewer bias in single-versus double-blind peer review[J]. Proc Natl Acad Sci U S A, 2017, 114 (48): 12708- 12713.
|
| 11 |
通过检查文档、演示文稿或工作簿删除隐藏数据和个人信息[EB/OL]. [2025-09-05]. https://support.microsoft.com/zh-cn/office/%E9%80%9A%E8%BF%87%E6%A3%80%E6%9F%A5%E6%96%87%E6%A1%A3-%E6%BC%94%E7%A4%BA%E6%96%87%E7%A8%BF%E6%88%96%E5%B7%A5%E4%BD%9C%E7%B0%BF%E5%88%A0%E9%99%A4%E9%9A%90%E8%97%8F%E6%95%B0%E6%8D%AE%E5%92%8C%E4%B8%AA%E4%BA%BA%E4%BF%A1%E6%81%AF-356b7b5d-77af-44fe-a07f-9aa4d085966f.
|
| 12 |
Sabet C J , Bajaj S S , Stanford F C , et al. Equity in scientific publishing:Can artificial intelligence transform the peer review process?[J]. Mayo Clinic Proceedings:Digital Health, 2023, 1 (4): 596- 600.
|
| 13 |
Perlis R H , Christakis D A , Bressler N M , et al. Artificial intelligence in peer review[J]. JAMA, 2025, 334 (17): No Pagination Specified.
|
| 14 |
Polak M P , Morgan D . Extracting accurate materials data from research papers with conversational language models and prompt engineering[J]. Nature Communications, 2024, 15 (1): 1569.
|
| 15 |
唐栋, 尹欢. 如何修改给作者退修稿word文件里的批注人姓名?[J]. 编辑学报, 2018, 30 (2): 181.
|
| 16 |
Bauersfeld L , Romero A , Muglikar M , et al. Cracking double-blind review:Authorship attribution with deep learning[J]. PLoS ONE, 2023, 17 (6): e0287611.
|
| [1] |
DING Jingjia,SONG Ningyuan. Generative Publishing: Connotation, Extension, and Framework[J]. Science-Technology & Publication, 2026, 45(1): 112-120. |
| [2] |
LYU Xiaofeng,MENG Xiangqing,ZHAN Hongchun. Ethical Challenges, Value Reconstruction, and Practical Approaches of Knowledge Services in Publishing Industry in the Era of Artificial Intelligence[J]. Science-Technology & Publication, 2025, 44(8): 47-55. |
| [3] |
LI Yanhong,DENG Lyuxiang. Challenges and Countermeasures for Implementing Open Peer Review in STM Journals[J]. Science-Technology & Publication, 2025, 44(6): 80-87. |
|
|
|
|