ASIC_|fy

    ▄▄▄    ▄▄▄    ▄▄▄    ▄▄▄    ▄▄▄    ▄▄▄    ▄▄▄    ▄▄▄
   █████  █████  █████  █████  █████  █████  █████  █████
   █████  █████  █████  █████  █████  █████  █████  █████
   ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀  ▀▀▀▀▀

ASIC_|fy

The compiler for AI silicon. PyTorch model in. Hardware-ready specification out.

Live playground → · Docs · Roadmap

What it does

   ┌──────────────┐    ┌─────────────────┐    ┌───────────────────┐
   │              │    │                 │    │  Synthesizable    │
   │  PyTorch     │───▶│   ASICify       │───▶│  Verilog          │
   │  checkpoint  │    │   compiler      │    │  + Cocotb test    │
   │              │    │                 │    │  + Area / cost    │
   └──────────────┘    └─────────────────┘    │  + FPGA bitstream │
                                              └───────────────────┘

ASICify takes a trained neural network and emits everything you need to put it on silicon: aggressively compressed weights, synthesizable Verilog with those weights hardwired as ROM constants, area / throughput / cost / energy estimates across eleven hardware targets, an FPGA reference implementation, and a verified Cocotb testbench.

It is the horizontal compiler underneath every AI chip company that exists today, and the ones that haven't started yet.

Why

Custom AI silicon costs $5–30M per tape-out and takes 6–18 months. The fabrication isn't the bottleneck — it's the model-to-hardware translation. Every chip company and edge-AI deployer currently does that translation by hand, with expensive specialist engineers, the same way they did it ten years ago.

Cadence and Synopsys built EDA for general-purpose chips. ASICify is built for one thing: turning a trained inference network into a fixed-function accelerator.

What you can do today

1. Play with the live estimator (zero install)

Open the playground in your browser. Drag the sparsity slider, switch between INT4 and ternary, swap targets between TSMC 28nm and Lattice ECP5. Watch silicon area, cost per chip, and throughput recompute in real time. Every number is from a published cost model — no fake gauges, no mock data.

$ pnpm --filter @asicify/web dev
$ open http://localhost:3001/playground

2. Compile to RTL via the CLI (from source)

There is no PyPI package yet. The CLI runs from a clone:

git clone https://github.com/claynicholson/asicify
cd asicify/apps/worker
uv sync
uv run asicify compile gpt2 \
    --quantization int4 \
    --sparsity 2:4 \
    --target tsmc28,ecp5

Sample output:

✓ Parsed model graph         (124M params, 12 layers)
✓ Quantized to INT4          (perplexity 24.3 to 25.1)
✓ Applied 2:4 sparsity       (50% zeros)
✓ Generated RTL              (top.v + 47 modules)
✓ Estimated tsmc28           (8.2 mm², $4.10 @ 100K)
✓ Estimated ecp5             (LFE5UM5G-85, 78% LUT util)

Output: ./build/gpt2-int4-2_4/

The output is a zip with top.v, per-layer modules, hardwired weights, Cocotb testbench, bit-exact Python reference, Makefile, and synthesis scripts for Yosys, nextpnr, and Vivado. Unzip and make sim or make synth-yosys.

3. Run the full local stack

The API and worker are implemented but not yet deployed publicly. To run them on your own machine, see Quickstart → REST API. You get the project lifecycle, WebSocket progress streaming, and presigned artifact downloads.

What's wired today

Capability	Status
Live client-side estimator	✓ Real math, real numbers
Markdown documentation site	✓ Auto-rendered from `/docs`
Landing, playground, pricing, blog, about	✓ Functional
FastAPI backend (auth + CRUD + queue + WS)	✓ Endpoints wired
Postgres schema + Alembic migrations	✓ Initial migration shipped
Worker pipeline (parse → quantize → … → validate)	✓ Stage orchestration
Hardware estimator (server-side)	✓ Cell library data for 11 targets
RTL generator + 14 Jinja2 templates	✓ Top + linear + attention + layernorm + KV cache + testbench + synthesis scripts
Multi-precision multiplier strategies	✓ binary / ternary / int4 CSD / int8 Booth / fp16 LUT
Real `torch.fx` model parsing	◐ Synthesized graph stub today; real parsing is next
Quantization weight-tensor work	◐ Config tracked; bit-packing is next
WebGPU in-browser inference comparison	○ Roadmap
PDF report generation	○ Roadmap
Modal deployment	○ Roadmap
Stripe billing	○ Roadmap

The MVP ships a complete spine. Filling in the model-loading kernels and real validation does not require API or pipeline changes.

Repository layout

asicify/
├── apps/
│   ├── web/             Next.js 15 frontend
│   │                    Landing · live playground · markdown docs · blog · about · pricing
│   │
│   ├── api/             FastAPI backend
│   │                    Clerk JWT auth · project CRUD · Redis job queue · WebSocket progress
│   │
│   └── worker/          Python worker
│                        ├── pipeline/   parse · quantize · sparsity · decompose · validate
│                        ├── rtl/        Jinja2 → Verilog package
│                        └── estimator/  area · throughput · cost · per-target cell library
│
├── packages/
│   └── shared/          TypeScript types (mirrored as Pydantic + Python dataclasses)
│
├── infra/               docker-compose: Postgres · Redis · MinIO (R2 stand-in)
│
└── docs/                User docs (rendered at /docs/*)
    └── internals/       Contributor docs (rendered at /docs/internals/*)

Quick start

Just the web app (fastest)

No databases, no Python, no GPU. The live estimator runs in your browser.

git clone https://github.com/claynicholson/asicify
cd asicify
pnpm install
pnpm --filter @asicify/web dev
# → http://localhost:3001

Full local stack

Adds the API, worker, Postgres, Redis, and an S3-compatible object store.

# 1. Install everything
pnpm install
cd apps/api    && uv sync && cd ../..
cd apps/worker && uv sync && cd ../..

# 2. Start infrastructure
docker compose -f infra/docker-compose.yml up -d

# 3. Run database migrations
cd apps/api && uv run alembic upgrade head && cd ../..

# 4. Boot all three apps
pnpm dev

You'll get:

Service	URL
Web	http://localhost:3001
API	http://localhost:8000
API docs	http://localhost:8000/docs
MinIO console	http://localhost:9001

Hardware targets

Eleven targets across three categories:

Category	Target
ASIC	SkyWater 130 · GF22FDX · TSMC 28nm · TSMC 16nm · TSMC 7nm
FPGA	Lattice ECP5 · Lattice CrossLink-NX · Xilinx Artix-7 · Xilinx Kria K26
Shuttle	TinyTapeout (sky130) · Efabless chipIgnite (sky130)

Each target ships with cell-library data: ROM bit area, SRAM bit area, INT8 multiplier area, max clock frequency, energy per MAC, wafer cost, defect density. All numbers cite published academic surveys or foundry data sheets and carry ±20–40% confidence bands.

See docs/methodology.md for the full cost-model derivation.

Compression methods

Five quantization modes, four sparsity patterns, three decompositions — fully composable.

Quantization     FP16 ─ INT8 ─ INT4 ─ Ternary ─ Binary
                                          1.6 bit/weight ─┐
                                                          │ Sub-1-bit
Sparsity         none ─ 2:4 ─ 4:8 ─ block 16×16 ─ unstructured
                                                          │
Decomposition    none ─ Monarch ─ Butterfly ─ Low-rank    │
                                                          ▼
                                                  Effective bits/weight
                                                  drops below 1

Each compression method maps to a specific multiplier strategy in the generated RTL:

Quantization	Multiplier strategy	Approx LUTs/MAC
Binary	XNOR + popcount	~1
Ternary	Sign-flip mux + zero-out	~3
INT4	CSD shift-add network	≤ 1 add
INT8	Booth multiplier	~10
FP16	Per-multiplier ROM-LUT	small ROM

Weights become localparam constants in weights.vh. The synthesis tool folds them directly into the multiplier inputs — for binary and ternary this collapses entirely into XOR/AND networks, with no real multipliers on die.

Architecture

┌────────────────────────────────────────────────────────────┐
│                    User (Browser)                          │
└───────────────────────────┬────────────────────────────────┘
                            │ HTTPS / WSS
┌───────────────────────────▼────────────────────────────────┐
│              Next.js Frontend (Vercel)                     │
│  • App Router  • Server Components  • Live estimator       │
└───────────────────────────┬────────────────────────────────┘
                            │ REST / WebSocket
┌───────────────────────────▼────────────────────────────────┐
│             FastAPI Backend (Fly.io)                       │
│  • Auth (Clerk JWT)  • Project CRUD  • Job orchestration   │
└──────┬──────────────┬──────────────────┬───────────────────┘
       │              │                  │
       ▼              ▼                  ▼
┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐
│ PostgreSQL  │ │   Redis      │ │   Modal Labs            │
│ (Neon)      │ │  (Upstash)   │ │  (GPU worker pool)      │
└─────────────┘ └──────────────┘ └──────────┬──────────────┘
                                            │
                                            ▼
                                  ┌─────────────────────┐
                                  │   Cloudflare R2     │
                                  │   Artifacts         │
                                  └─────────────────────┘

The compiler core (apps/worker/worker/{pipeline,rtl,estimator}/) runs without the API, the database, Redis, or any auth. The CLI calls it directly. The hosted product layers convenience on top.

See docs/architecture.md for component responsibilities and docs/internals/data-flow.md for end-to-end traces.

Tech stack

Layer	Choice
Frontend	Next.js 15 · React 19 · TypeScript strict · Tailwind v4 · Recharts
Backend	FastAPI · Pydantic v2 · SQLAlchemy 2.0 async · Alembic
Worker	PyTorch · Transformers · Jinja2 · structlog
Database	PostgreSQL (Neon)
Queue	Redis lists + pub/sub (Upstash)
Storage	Cloudflare R2 (S3-compatible; MinIO locally)
Auth	Clerk
Compute	Modal Labs
Build	Turborepo · pnpm · uv
Verification	Cocotb · Verilator
Synthesis	Yosys · nextpnr · Vivado

Differentiators

Multi-target backend. One tool, every target. The same source model emits SkyWater 130 RTL, TSMC 28nm RTL, ECP5 bitstreams, TinyTapeout shuttles. Compare them in one dashboard.

Open-source core. MIT-licensed. The compression pipeline and RTL generator are on GitHub. No NDAs, no per-tape-out licensing. The hosted product is convenience and compute.

Hardware-software co-design. Sub-1-bit effective representation via ternary + sparsity + decomposition. Monarch matrix factorization built into synthesis. Hardware-aware fine-tuning that targets your specific deployment.

Inverse design. Specify a target chip area or BOM cost; ASICify searches the model architecture space for the best model that fits. Hardware-aware NAS with real cost models.

Design space exploration. Pull a slider in the playground, watch the chip change. Cached estimates update in under a millisecond.

Documentation

For users:

docs/quickstart.md — Compile your first model
docs/architecture.md — System overview
docs/methodology.md — Cost model derivation
docs/rtl-generation.md — Verilog templates and multiplier strategies
docs/roadmap.md — Phase plan

For contributors:

docs/codebase.md — Codebase tour. Start here.
docs/internals/web.md — Frontend
docs/internals/api.md — Backend
docs/internals/worker.md — Pipeline + RTL gen + estimator
docs/internals/data-flow.md — End-to-end traces
docs/internals/extending.md — Recipes: add a target, a precision, a primitive
docs/internals/conventions.md — Code style
docs/internals/glossary.md — ML, silicon, EDA terminology

Status

Pre-1.0. Spine first: model in → compressed model out → quality validation → RTL out → cost estimate → playground.

We're shipping every week. Watch docs/roadmap.md for the phase plan and CHANGELOG.md for what landed.

Contributing

PRs welcome. The high-leverage areas:

Adding hardware targets (cell library data with citations)
New compression methods (FP4, FP8 E4M3, MXFP formats)
New layer kinds (Mamba blocks, MoE routers, diffusion primitives)
Refining cost-model parameters with foundry data sheets

See CONTRIBUTING.md and docs/internals/extending.md for recipes.

License

MIT. The hosted version at asicify.com layers convenience and compute over this open-source core. Premium hardware targets (TSMC leading-edge nodes, Samsung) require commercial agreements with the foundries; the open core supports SkyWater 130, GF22FDX, ECP5, Artix-7 directly.

Acknowledgements

Tri Dao and the HazyResearch team for Monarch matrices
Matt Venn for TinyTapeout
SkyWater and Efabless for the open PDK movement
The Yosys + nextpnr team for the open synthesis flow
Cocotb for making hardware verification feel like Python

ASIC_|fy · Built for the AI silicon era · MIT licensed

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude		.claude
.github/workflows		.github/workflows
apps		apps
docs		docs
infra		infra
packages/shared		packages/shared
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
env.example		env.example
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASIC_|fy

What it does

Why

What you can do today

1. Play with the live estimator (zero install)

2. Compile to RTL via the CLI (from source)

3. Run the full local stack

What's wired today

Repository layout

Quick start

Just the web app (fastest)

Full local stack

Hardware targets

Compression methods

Architecture

Tech stack

Differentiators

Documentation

Status

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASIC|fy

What it does

Why

What you can do today

1. Play with the live estimator (zero install)

2. Compile to RTL via the CLI (from source)

3. Run the full local stack

What's wired today

Repository layout

Quick start

Just the web app (fastest)

Full local stack

Hardware targets

Compression methods

Architecture

Tech stack

Differentiators

Documentation

Status

Contributing

License

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ASIC_|fy

Packages