STACCOVERFLOW // 2026[ EDITION 2 ]SOLANA STRESS TEST

> deploying neural network to blockchain_

BREAK
SOLANA

EDITION 2AN LLM THAT EATS 25% OF THE NETWORK

We deployed a full GPT neural network as a Solana program. Every matrix multiplication, every attention head, every token — computed entirely on-chain. One sentence consumes 25% of Solana's total compute capacity for a full minute.

// STATUS: PENDING MAINNET DEPLOYMENT

HELP US BREAK MAINNET

The LLM is built and verified on localnet. Deploying to mainnet requires ~69 SOL for program deployment fees, account rent, and transaction costs. Buy the token below to fund the launch.

// MAINNET LAUNCH FUNDLIVE

VERIFY ON-CHAIN →

8.0%

CURRENT: 5.54 SOL

GOAL: 69 SOL

89VB5UmvopuCFmp5Mf8YPX28fGvvqn79afCgouQuPyhY

// FUND THE EXPERIMENT

Buy the Token, Deploy the LLM

This token funds the mainnet launch — program deployment, account creation, and transaction fees. Largely unrelated to the research itself, but it's the final piece needed to go live and stress-test Solana for real.

CA: CLWeikxiw8pC9JEtZt14fqDzYfXF7uVwLuvnJPkrE7av

Paste the CA in Axiom's search bar after signing up

BUY ON AXIOM →

// ON-CHAIN OUTPUT — VERIFIED ON SOLANA LOCALNET

PROMPT:

"Once upon a time"

OUTPUT:

"Once upon a time..."

TOKENS: 11, 612, 373, 257, 1310|TXS: 1,699|TIME: 11.5 min|MATCHES HUGGINGFACE REFERENCE

THE DAMAGE

OF SOLANA'S COMPUTE

consumed per sentence

0.0B

COMPUTE UNITS

per sentence generated

TRANSACTIONS

for 9 tokens of output

BLOCKS CONSUMED

~59 seconds of chain

// ARCHITECTURE

HOW WE DID IT

A full GPT-Neo transformer with 8 attention layers, 16 heads, and a 50,257-token vocabulary — deployed as a Solana BPF program. Every forward pass happens on-chain. No oracles. No off-chain compute. Pure blockchain inference.

MODELTinyStories-1M (GPT-Neo)

WEIGHTS10.78 MB across 2 accounts

PRECISIONF16 embed + F32 layers + INT8 lm_head

LAYERS8 transformer blocks, 16 attention heads

VOCAB50,257 tokens (full GPT-2)

INSTRUCTIONS15 per position per layer

CU LIMIT1.4M per transaction

FIG.01 — NEURAL NETWORK ON CHAIN

// NETWORK IMPACT ANALYSIS

BLOCK SPACE

FIG.02 — BLOCK SPACE CONSUMPTION

Solana processes 48 million compute units per block, with blocks every 400ms. Each token of our LLM output requires ~148.5 million CU — that's 3 entire blocks worth of compute. But the real constraint is the per-account write lock: only 12M CU per account per block.

CRITICAL FINDING

One user generating a 9-token sentence would consume approximately 25% of Solana's entire network compute capacity for one full minute. Multiple concurrent users running inference would create significant block space contention.

Block CU limit48,000,000

Write-lock limit/block12,000,000

CU per token~148,500,000

Blocks per token~15 (write-lock limited)

Wall time per token~6 seconds (theoretical)

Cost per token~0.001 SOL (~$0.17)

PREVIOUSLY

EDITION 1

In 2023, I discovered that Clockwork's scheduling software allowed recursive transactions — a transaction that spawns another transaction in the same slot. With enough SOL to pay the fees, this created an infinite loop that overwhelmed validators.

The Clockwork team shrugged it off. A few days later, the entire Solana network went down when Clockwork came back online. They eventually shut down in October 2023, citing "limited commercial upside."

"I figured out that you could do recursive transactions. A transaction that calls another transaction in the same slot. If you have enough money to pay the Pied Piper, that's terrible for blockchains."
— staccoverflow, Darknet Diaries EP 152

LISTEN: DARKNET DIARIES EP 152→

Clockwork recursive exploit visualization

FIG.03 — RECURSIVE TRANSACTION EXPLOIT

2023Clockwork recursive transaction exploit discovered

2023Solana network outage when Clockwork restarted

OCT 23Clockwork shuts down permanently

2026Edition 2: On-chain LLM inference

// OPEN INVITATION

YOUR
TURN

The program is deployed. The weights are on-chain. The math checks out. Now we want to see what happens when multiple users run inference simultaneously on mainnet. How does the network handle it? What breaks first — the scheduler, the write locks, or the validators?

This isn't about breaking things for the sake of it. It's about understanding the real limits of on-chain computation. Every blockchain claims infinite scalability. Let's test that.

SCENARIO 1

Single User Inference

One wallet runs a full sentence generation. Observe how 191 transactions per token interact with the block scheduler and priority fee market.

SCENARIO 2

Concurrent Users

Multiple wallets generate simultaneously. Each user has independent state accounts, but they all compete for block space. The write-lock contention becomes the bottleneck.

SCENARIO 3

The Nash Equilibrium

What's the game-theoretic equilibrium? If inference costs ~0.001 SOL/token but consumes 3 blocks of compute, how does the priority fee market respond? At what point does it become economically irrational to continue?

// FOR THE TECHNICALLY CURIOUS

UNDER THE HOOD

The biggest challenge wasn't deploying the model — it was fitting each operation within Solana's 1.4 million compute unit limit per transaction. A single 64x64 matrix multiplication in f32 costs ~700,000 CU in BPF bytecode. Each transformer layer requires Q, K, V, and O projections plus a 4x-wide FFN — that's 8 matrix multiplications per layer, per position.

We split each layer into 15 separate instructions: LN1, Q_PROJ, K_PROJ, ATTN, V_O_PROJ, LN2, and 4 FFN chunks (UP_A, UP_B, DOWN_A, DOWN_B) plus GELU activation. For a 4-token prompt through 8 layers, that's 480 transactions just for the prefill phase.

// Per-token instruction breakdown

EMBED → 1 tx // token embedding lookup

LAYER x8 → 128 tx // 16 instructions per layer

OUTPUT_LN → 1 tx // final layer norm

COPY_HIDDEN → 1 tx // prepare for argmax

ARGMAX → 64 tx // 4 workers x 16 sub-chunks

MERGE → 1 tx // find best token

TOTAL: ~191 transactions per generated token

The quantization story is equally wild. INT8 quantization was too lossy for a 64-dimension model — cosine similarity between INT8 and f32 hidden states degraded to 0.287 by layer 8. We ended up with a hybrid approach: f16 embeddings, f32 transformer weights, and INT8 only for the output projection (lm_head). The model produces output that matches the HuggingFace reference token-for-token.

MULTI-USER

PER-USER STATE

141 KB

Each wallet gets its own state account with KV cache for context. Fully independent — your inference doesn't touch anyone else's.

RENT DEPOSIT

~1.01 SOL

Refundable when you close your session. The state account stores hidden states, KV cache across all 8 layers, and worker accounts.

CONTEXT LENGTH

32 TOKENS

Per-layer KV cache supports up to 32 positions. The model remembers your full conversation context for multi-turn generation.

> READY TO STRESS TEST?_

THE CODE IS OPEN

Everything — the Solana program, the model conversion scripts, the generation client, the multi-user architecture — is available for anyone to deploy, test, and push to mainnet.

VIEW SOURCE CODE FOLLOW @STACCoverflow