JouleBeat · DeepGraph

自主科研,结论可查。 Autonomous research, conclusions you can check.

输入你希望的研究方向 → 引擎免费跑出:领域证据地图、经查新的 idea、审稿人挑不出毛病的实验方案、初步信号。付费取得可投稿的已验证证据,以及符合特定格式的 paper。 Type in the research direction you care about → the engine runs for free: a field evidence map, novelty-checked ideas, an experiment plan reviewers can't fault, and an early signal. Pay to get submission-ready verified evidence and a properly formatted paper.

不要你的数据 / 不要你的代码 / 不要你的稿子——证据我们自己造;连负结果也照实报;每个数字可溯源到一次真实计算。 We don't want your data / your code / your draft — we build the evidence ourselves; negative results are reported as-is; every number traces back to one real computation.

提交你的研究方向Submit your research direction → 看运行中的系统See the live system →

为什么Why

AI 写科研的工具不少,但没人敢信结果:这个数哪来的?显著吗?有没有数据泄漏?是不是模型编的?我们解决的就是这一件事——让自主产出的结论站得住、查得到。 Plenty of tools write research with AI, but no one trusts the results: where did this number come from? Is it significant? Any data leakage? Did the model just make it up? That's the one thing we solve — making autonomous conclusions hold up, and checkable.

怎么开始How to start

你给一句话,引擎给四件套 You give one sentence — the engine gives you four things

一个方向 30 秒看懂、2 分钟填完。自然语言就行;系统识别后会把它理解的版本回显给你确认,再开跑。 Thirty seconds to read, two minutes to fill in. Plain language is fine; the system echoes back its understanding for you to confirm before anything runs.

# 研究方向配置(示例) direction: "用扩散模型做小样本医学影像分割, 关注跨中心泛化" # 以下全部可选 keywords: [medical imaging, diffusion, few-shot] constraints: compute: "单卡以内" data: "仅公开数据集" goal: experiment_plan # idea_only | experiment_plan # | signal | verified_evidence contact: "群昵称或邮箱"

你给的What you give

一句话说清研究方向。可选:关键词、算力/数据约束、想要到哪一步。不需要你的数据、代码、稿子。One sentence describing your direction. Optionally: keywords, compute/data constraints, how far you want to go. No data, code, or drafts needed.

免费得到的What you get, free

领域证据地图(谁在做什么、空白在哪)、经查新的 idea、按审稿人标准抠过的实验方案、CPU 小样信号。A field evidence map (who's doing what, where the gaps are), novelty-checked ideas, an experiment plan held to reviewer standards, and a CPU pilot signal.

付费得到的What you get, paid

多种子真实实验、同算力 baseline、每个数字可溯源的证据包、按目标会议模板的成稿。负结果照实报。Multi-seed real experiments, compute-matched baselines, an evidence pack where every number is traceable, and a venue-formatted manuscript. Negative results reported as-is.

付费层 · 已验证证据Paid tier · verified evidence

从选题到成稿,一条可查的闭环 From topic to manuscript — one checkable loop

前段(证据地图 / 查新 idea / 实验方案 / 小样信号)免费。走到多种子真实实验、同算力 baseline、签名证据包和按会议模板的成稿,是付费的证据层——下面这六件是它包含的全部体力活。 The front end (evidence map / novelty-checked ideas / experiment plan / early signal) is free. Going all the way — multi-seed real experiments, compute-matched baselines, signed evidence packs and venue-formatted manuscripts — is the paid evidence tier. The six items below are the full grind it covers.

自主选题Picks its own problems

扫文献信号、找研究空白、查新颖性。Scans literature signals, finds research gaps, checks novelty.

设计并真实运行实验Designs & actually runs experiments

自动起实验、跑代码迭代;用 hash 校验的真实数据,明确标注真实 vs 合成,smoke / 合成不算完成。Spins up experiments and iterates on code; uses hash-verified real data, labels real vs synthetic explicitly — smoke / synthetic runs never count as done.

把严谨性做对Gets the rigor right

多种子、bootstrap 置信区间、效应量、显著性检验、数据泄漏排查、同算力对齐 baseline。Multi-seed, bootstrap confidence intervals, effect sizes, significance tests, leakage audits, compute-matched baselines.

可信脊梁The trust spine

生成可溯源稿件Traceable manuscripts

每个数字回溯到一次已验证计算,查无出处直接拦;被证伪的结论禁止写成正面结论;占位符 / 串号 / 复制污染自动扫除。Every number traces back to a verified computation — no source, it's blocked; refuted conclusions can't be written up as positive; placeholders, cross-numbering and copy contamination are swept out automatically.

投稿就绪Submission-ready

6 种会议 / 期刊模板自动排版 + 格式合规校验。Auto-formats to 6 venue / journal templates with compliance checks.

实时看板 + 人工复核Live dashboard + human review

双语看板看到每一步证据与过闸结果,关键节点你来签字。A bilingual dashboard shows every step's evidence and gate results — you sign off at the key checkpoints.

下一步 · 自动化继续往上走Next · pushing autonomy further

把自主度推得更高,但永远先保证可查 More autonomy — but always checkable first

独立交叉评审Independent cross-review

引入不同模型族当对抗审稿,作者 ≠ 审稿,堵住"自我背书"。A different model family acts as adversarial reviewer; author ≠ reviewer, closing the "self-endorsement" loophole.

可信证据链(provenance)Trust provenance chain

每个结论盖戳——谁产出、谁审、内容指纹,全程可追、防篡改。Every conclusion stamped with who produced it, who reviewed it, and a content fingerprint — fully traceable and tamper-evident.

验证引擎对外开放Verification engine, opened up

把这套"上传结果 → 出可溯源的可信度判定"做成独立工具 / API,变成一个产品入口。Turn "upload results → get a traceable trust verdict" into a standalone tool / API — its own product entry point.

更广领域、更少人工Broader fields, less manual

从 ML 扩到更多学科;随可信机制成熟逐步减少人工节点,把自主度推得更高——但永远先保证可查,再谈自动。Expand from ML to more disciplines and gradually remove manual checkpoints as the trust machinery matures — always checkable first, automated second.

FAQ

研究者常问的 What researchers ask

多种子实验要跑几个种子、怎么报告才算对?How many seeds does a multi-seed experiment need, and how should it be reported?

单种子结果在审稿里站不住。常见做法是至少 3–5 个随机种子,报均值加置信区间(而不是只报最好的一次),配对比较用配对检验。引擎默认按多种子跑,逐种子结果全部保留在证据包里,不挑好的报。Single-seed results don't survive review. Common practice is at least 3–5 random seeds, reporting mean with a confidence interval (not the best run), and paired tests for paired comparisons. The engine runs multi-seed by default and keeps every per-seed result in the evidence pack — no cherry-picking.

显著性检验怎么选?只报 p 值够吗?How do I choose a significance test? Is a p-value alone enough?

小样本、不知道分布时,bootstrap 置信区间和置换检验比套公式的 t 检验更稳;多组比较要做多重校正;p 值之外应同时报效应量。引擎按这个标准算,p=1 就如实写不显著,不会把"接近显著"写成结论。With small samples and unknown distributions, bootstrap confidence intervals and permutation tests are safer than textbook t-tests; multiple comparisons need correction; effect sizes should accompany p-values. The engine computes to this standard — if p=1 it says "not significant", and never writes "nearly significant" as a conclusion.

数据泄漏怎么查?How do you check for data leakage?

最常见的泄漏是训练/测试集重叠、按错误的维度切分(比如同一病人/同一站点出现在两边)、以及用测试集调参。引擎在实验方案阶段就把切分方式写死并检查重叠,实验记录里保留切分的可复现描述,供任何人复查。The most common leaks are train/test overlap, splitting along the wrong dimension (the same patient or site on both sides), and tuning on the test set. The engine fixes the split in the experiment plan, checks for overlap, and keeps a reproducible description of the split in the record for anyone to re-examine.

什么是"同算力 baseline",为什么重要?What is a compute-matched baseline and why does it matter?

很多"超过 baseline"的结果,只是因为新方法多用了训练时间或调参预算。同算力对齐的意思是:给 baseline 同样的计算预算再比。这是审稿人越来越常问的问题,引擎把它作为付费证据层的默认动作。Many "beats the baseline" results just gave the new method more training time or tuning budget. Compute-matched means giving the baseline the same budget before comparing. Reviewers increasingly ask for this; the engine does it by default in the paid evidence tier.

AI 能替我做实验验证吗?结果能信吗?Can AI run experimental validation for me? Can I trust the result?

能跑是一回事,能信是另一回事。我们的做法是把"信"的部分从 AI 手里拿走:数字溯源、显著性、种子复现、泄漏检查这些有确定性标准的环节由写死的规则代码把关,AI 只负责探索;查无出处的数字直接拦下,不进结论。Running is one thing; trusting is another. Our approach takes the "trust" part out of the AI's hands: provenance, significance, seed reproducibility and leakage checks — everything with a deterministic standard — is enforced by hard-coded gates, while the AI only explores. A number without a verified source gets blocked from the conclusions.

跑出来是负结果怎么办?What if the result is negative?

照实报。付费交付物是"已验证的证据包",不是"保证正面成果"——负结果同样有证据价值,至少帮你省掉一条死路。证据不足时,系统会拒绝把它写成论文,这是设计出来的行为。It gets reported as-is. The paid deliverable is a verified evidence pack, not a guaranteed positive finding — a negative result has evidence value too, saving you a dead end. When evidence is insufficient, the system refuses to write it up as a paper. That behaviour is by design.

我需要上传数据、代码或论文稿吗?Do I need to upload data, code, or a draft?

不需要,也不收。你只提交研究方向,证据由引擎用公开数据集自己造。这样既不碰你的未发表成果,产出的证据也和你的私有材料无依赖,任何人都能复查。No — and we don't accept them. You only submit a research direction; the engine builds evidence itself on public datasets. Your unpublished work is never touched, and the resulting evidence has no dependency on your private material, so anyone can re-examine it.

免费和付费的边界在哪里?Where is the line between free and paid?

边界就是真算力:不花 GPU 的建议层免费(证据地图、查新 idea、实验方案、CPU 小样信号);要真跑实验的证据层付费(多种子真实实验、同算力 baseline、可溯源证据包、按会议模板成稿)。我们只对有确定性验证标准的产出收费。The line is real compute: the advisory tier costs nothing to run and is free (evidence map, novelty-checked ideas, experiment plan, CPU pilot signal); the evidence tier needs real experiments and is paid (multi-seed runs, compute-matched baselines, traceable evidence packs, venue-formatted manuscripts). We only charge for outputs with deterministic verification standards.