On-Chain Data in Production: What No One Tells You
Blockchain data isn't clean, reliable, or easy. RPC rate limits, chain reorgs, BigInt bugs, and indexing tradeoffs — hard lessons from shipping real DeFi products.
There's this fantasy that on-chain data is inherently trustworthy. Immutable ledger. Transparent state. Just read it and you're done.
I believed it too. Then I shipped a DeFi dashboard to production and spent three weeks figuring out why our token balances were wrong, our event history had gaps, and our database contained transactions from blocks that no longer existed.
On-chain data is raw, hostile, and full of edge cases that will break your application in ways you won't notice until a user files a bug report. This post covers everything I learned the hard way.
The Illusion of Reliable Data#
Here's the first thing nobody tells you: the blockchain doesn't give you data. It gives you state transitions. There's no SELECT * FROM transfers WHERE user = '0x...'. There are logs, receipts, storage slots, and call traces — all encoded in formats that require context to decode.
A Transfer event log gives you from, to, and value. It doesn't tell you the token symbol. It doesn't tell you the decimals. It doesn't tell you if this is a legitimate transfer or a fee-on-transfer token skimming 3% off the top. It doesn't tell you if this block will still exist in 30 seconds.
The "immutable" part is true — once finalized. But finalization isn't instant. And the data you get back from an RPC node isn't necessarily from a finalized block. Most developers query latest and treat it as truth. That's a bug, not a feature.
Then there's the encoding. Everything is hex. Addresses are mixed-case checksummed (or not). Token amounts are integers multiplied by 10^decimals. A USDC transfer of $100 looks like 100000000 on-chain because USDC has 6 decimals, not 18. I've seen production code that assumed 18 decimals for every ERC-20 token. The resulting balances were off by a factor of 10^12.
RPC Rate Limits Will Ruin Your Weekend#
Every production Web3 app talks to an RPC endpoint. And every RPC endpoint has rate limits that are far more aggressive than you expect.
Here are the numbers that matter:
- Alchemy Free: ~30M compute units/month, 40 requests/minute. That sounds generous until you realize a single
eth_getLogscall over a wide block range can eat hundreds of CUs. You'll burn through your monthly quota in a day of indexing. - Infura Free: 100K requests/day, roughly 1.15 req/sec. Try paginating through 500K blocks of event logs at that rate.
- QuickNode Free: Similar to Infura — 100K requests/day.
The paid tiers help, but they don't eliminate the problem. Even at $200/month on Alchemy's Growth plan, a heavy indexing job will hit throughput limits. And when you hit them, you don't get a graceful degradation. You get 429 errors, sometimes with unhelpful messages, sometimes with no retry-after header.
The solution is a combination of fallback providers, retry logic, and being very deliberate about which calls you make. Here's what a robust RPC setup looks like with viem:
import { createPublicClient, fallback, http } from "viem";
import { mainnet } from "viem/chains";
const client = createPublicClient({
chain: mainnet,
transport: fallback(
[
http("https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY", {
retryCount: 3,
retryDelay: 1500,
timeout: 15_000,
}),
http("https://mainnet.infura.io/v3/YOUR_KEY", {
retryCount: 3,
retryDelay: 1500,
timeout: 15_000,
}),
http("https://rpc.ankr.com/eth", {
retryCount: 2,
retryDelay: 2000,
timeout: 20_000,
}),
],
{ rank: true }
),
});The rank: true option is critical. It tells viem to measure latency and success rate for each transport and automatically prefer the fastest, most reliable one. If Alchemy starts rate-limiting you, viem shifts traffic to Infura. If Infura goes down, it falls back to Ankr.
But there's a subtlety: viem's default retry logic uses exponential backoff, which is usually what you want. However, as of early 2025, there's a known issue where retryCount doesn't properly retry RPC-level errors (like 429s) when batch mode is enabled. If you're batching requests, test your retry behavior explicitly. Don't trust that it works.
Reorgs: The Bug You Won't See Coming#
A chain reorganization happens when the network temporarily disagrees on which block is canonical. Node A sees block 1000 with transactions [A, B, C]. Node B sees a different block 1000 with transactions [A, D]. Eventually the network converges, and one version wins.
On proof-of-work chains, this was common — 1-3 block reorgs happened multiple times per day. Post-merge Ethereum is better. A successful reorg attack now requires coordination of close to 50% of validators. But "better" isn't "impossible." There was a notable 7-block reorg on the Beacon Chain in May 2022, caused by inconsistent client implementations of the proposer boost fork.
And it doesn't matter how rare reorgs are on Ethereum mainnet. If you're building on L2s or sidechains — Polygon, Arbitrum, Optimism — reorgs are more frequent. Polygon historically had reorgs of 10+ blocks.
Here's the practical problem: you indexed block 18,000,000. You wrote events to your database. Then block 18,000,000 got reorged. Now your database has events from a block that doesn't exist on the canonical chain. Those events might reference transactions that never happened. Your users see phantom transfers.
The fix depends on your architecture:
Option 1: Confirmation delay. Don't index data until N blocks of confirmations have passed. For Ethereum mainnet, 64 blocks (two epochs) gives you finality guarantees. For L2s, check the specific chain's finality model. This is simple but adds latency — roughly 13 minutes on Ethereum.
Option 2: Reorg detection and rollback. Index aggressively but track block hashes. On each new block, verify that the parent hash matches the previous block you indexed. If it doesn't, you've detected a reorg: delete everything from the orphaned blocks and re-index the canonical chain.
interface IndexedBlock {
number: bigint;
hash: `0x${string}`;
parentHash: `0x${string}`;
}
async function detectReorg(
client: PublicClient,
lastIndexed: IndexedBlock
): Promise<{ reorged: boolean; depth: number }> {
const currentBlock = await client.getBlock({
blockNumber: lastIndexed.number,
});
if (currentBlock.hash === lastIndexed.hash) {
return { reorged: false, depth: 0 };
}
// Walk backwards to find where the chain diverged
let depth = 1;
let checkNumber = lastIndexed.number - 1n;
while (checkNumber > 0n && depth < 128) {
const onChain = await client.getBlock({ blockNumber: checkNumber });
const inDb = await getIndexedBlock(checkNumber); // your DB lookup
if (onChain.hash === inDb?.hash) {
return { reorged: true, depth };
}
depth++;
checkNumber--;
}
return { reorged: true, depth };
}This isn't hypothetical. I've had a production system where we indexed events at the chain tip without reorg detection. For three weeks it worked fine. Then a 2-block reorg on Polygon caused a duplicate NFT mint event in our database. The frontend showed a user owning a token they didn't. That one took two days to debug because nobody was looking for reorgs as the root cause.
The Indexing Problem: Pick Your Pain#
You have three real options for getting structured on-chain data into your application.
Direct RPC Calls#
Just call getLogs, getBlock, getTransaction directly. This works for small-scale reads — checking a user's balance, fetching recent events for a single contract. It does not work for historical indexing or complex queries across contracts.
The problem is combinatorial. Want all Uniswap V3 swaps in the last 30 days? That's ~200K blocks. At Alchemy's 2K block range limit per getLogs call, that's 100 paginated requests minimum. Each one counts against your rate limit. And if any call fails, you need retry logic, cursor tracking, and a way to resume from where you left off.
The Graph (Subgraphs)#
The Graph was the OG solution. Define a schema, write mappings in AssemblyScript, deploy, and query with GraphQL. The Hosted Service was deprecated — everything is now on the decentralized Graph Network, which means you pay with GRT tokens for queries.
The good: standardized, well-documented, large ecosystem of existing subgraphs you can fork.
The bad: AssemblyScript is painful. Debugging is limited. Deployment takes minutes to hours. If your subgraph has a bug, you redeploy and wait for it to re-sync from scratch. The decentralized network adds latency and sometimes indexers lag behind the chain tip.
I've used The Graph for read-heavy dashboards where data freshness of 30-60 seconds is acceptable. It works well there. I would not use it for anything requiring real-time data or complex business logic in the mappings.
Custom Indexers (Ponder, Envio)#
This is where the ecosystem has matured significantly. Ponder and Envio let you write indexing logic in TypeScript (not AssemblyScript), run locally during development, and deploy as standalone services.
Ponder gives you maximum control. You define event handlers in TypeScript, it manages the indexing pipeline, and you get a SQL database as output. The tradeoff: you own the infrastructure. Scaling, monitoring, reorg handling — it's on you.
Envio optimizes for sync speed. Their benchmarks show significantly faster initial sync times compared to The Graph. They handle reorgs natively and support HyperSync, a specialized protocol for faster data fetching. The tradeoff: you're buying into their infrastructure and API.
My recommendation: if you're building a production DeFi app and you have engineering capacity, use Ponder. If you need the fastest possible sync and don't want to manage infrastructure, evaluate Envio. If you need a quick prototype or want community-maintained subgraphs, The Graph is still fine.
getLogs Is More Dangerous Than It Looks#
The eth_getLogs RPC method is deceptively simple. Give it a block range and some filters, get back matching event logs. Here's what actually happens in production:
Block range limits vary by provider. Alchemy caps at 2K blocks (unlimited logs) or unlimited blocks (max 10K logs). Infura has different limits. QuickNode has different limits. A public RPC might cap at 1K blocks. Your code must handle all of these.
Response size limits exist. Even within the block range, if a popular contract emits thousands of events per block, your response can exceed the provider's payload limit (150MB on Alchemy). The call doesn't return partial results. It fails.
Empty ranges are not free. Even if there are zero matching logs, the provider still scans the block range. This counts against your compute units.
Here's a pagination utility that handles these constraints:
import type { PublicClient, Log, AbiEvent } from "viem";
async function fetchLogsInChunks<T extends AbiEvent>(
client: PublicClient,
params: {
address: `0x${string}`;
event: T;
fromBlock: bigint;
toBlock: bigint;
maxBlockRange?: bigint;
}
): Promise<Log<bigint, number, false, T, true>[]> {
const { address, event, fromBlock, toBlock, maxBlockRange = 2000n } = params;
const allLogs: Log<bigint, number, false, T, true>[] = [];
let currentFrom = fromBlock;
while (currentFrom <= toBlock) {
const currentTo =
currentFrom + maxBlockRange - 1n > toBlock
? toBlock
: currentFrom + maxBlockRange - 1n;
try {
const logs = await client.getLogs({
address,
event,
fromBlock: currentFrom,
toBlock: currentTo,
});
allLogs.push(...logs);
currentFrom = currentTo + 1n;
} catch (error) {
// If the range is too large (too many results), split it in half
if (isRangeTooLargeError(error) && currentTo > currentFrom) {
const mid = currentFrom + (currentTo - currentFrom) / 2n;
const firstHalf = await fetchLogsInChunks(client, {
address,
event,
fromBlock: currentFrom,
toBlock: mid,
maxBlockRange,
});
const secondHalf = await fetchLogsInChunks(client, {
address,
event,
fromBlock: mid + 1n,
toBlock: currentTo,
maxBlockRange,
});
allLogs.push(...firstHalf, ...secondHalf);
currentFrom = currentTo + 1n;
} else {
throw error;
}
}
}
return allLogs;
}
function isRangeTooLargeError(error: unknown): boolean {
const message = error instanceof Error ? error.message : String(error);
return (
message.includes("Log response size exceeded") ||
message.includes("query returned more than") ||
message.includes("exceed maximum block range")
);
}The key insight is the binary split on failure. If a 2K block range returns too many logs, split it into two 1K ranges. If 1K is still too much, split again. This adapts automatically to high-activity contracts without requiring you to know the event density in advance.
BigInt Will Humble You#
JavaScript's Number type is a 64-bit float. It can represent integers up to 2^53 - 1 — about 9 quadrillion. That sounds like a lot until you realize that a token amount of 1 ETH in wei is 1000000000000000000 — a number with 18 zeros. That's 10^18, well beyond Number.MAX_SAFE_INTEGER.
If you accidentally coerce a BigInt to a Number anywhere in your pipeline — JSON.parse, a database driver, a logging library — you get silent precision loss. The number looks roughly correct but the last few digits are wrong. You won't catch this in testing because your test amounts are small.
Here's the bug I shipped to production:
// THE BUG: Looks harmless, isn't
function formatTokenAmount(amount: bigint, decimals: number): string {
return (Number(amount) / Math.pow(10, decimals)).toFixed(4);
}
// For small amounts this works fine:
formatTokenAmount(1000000n, 6); // "1.0000" -- correct
// For large amounts it breaks silently:
formatTokenAmount(123456789012345678n, 18);
// Returns "0.1235" -- WRONG, actual precision is lost
// Number(123456789012345678n) === 123456789012345680
// The last two digits got rounded by IEEE 754The fix: never convert to Number before dividing. Use viem's built-in utilities, which operate on strings and BigInts:
import { formatUnits, parseUnits } from "viem";
// Correct: operates on BigInt, returns string
function formatTokenAmount(
amount: bigint,
decimals: number,
displayDecimals: number = 4
): string {
const formatted = formatUnits(amount, decimals);
// formatUnits returns the full precision string like "0.123456789012345678"
// Truncate (don't round) to desired display precision
const [whole, fraction = ""] = formatted.split(".");
const truncated = fraction.slice(0, displayDecimals).padEnd(displayDecimals, "0");
return `${whole}.${truncated}`;
}
// Also critical: use parseUnits for user input, never parseFloat
function parseTokenInput(input: string, decimals: number): bigint {
// parseUnits handles the string-to-BigInt conversion correctly
return parseUnits(input, decimals);
}Notice I truncate instead of rounding. This is deliberate. In financial contexts, showing "1.0001 ETH" when the real value is "1.00009999..." is better than showing "1.0001" when the real value is "1.00005001..." and was rounded up. Users make decisions based on displayed amounts. Truncation is the conservative choice.
Another trap: JSON.stringify doesn't know how to serialize BigInt. It throws. Every single response from your API that includes token amounts needs a serialization strategy. I use string conversion at the API boundary:
// API response serializer
function serializeForApi(data: Record<string, unknown>): string {
return JSON.stringify(data, (_, value) =>
typeof value === "bigint" ? value.toString() : value
);
}Caching Strategy: What, How Long, and When to Invalidate#
Not all on-chain data has the same freshness requirements. Here's the hierarchy I use:
Cache forever (immutable):
- Transaction receipts (once mined, they don't change)
- Finalized block data (block hash, timestamp, transaction list)
- Contract bytecode
- Historical event logs from finalized blocks
Cache for minutes to hours:
- Token metadata (name, symbol, decimals) — technically immutable for most tokens, but proxy upgrades can change the implementation
- ENS resolutions — 5 minute TTL works well
- Token prices — depends on your accuracy requirements, 30 seconds to 5 minutes
Cache for seconds or not at all:
- Current block number
- Account balances and nonce
- Pending transaction status
- Unfinalized event logs (the reorg problem again)
The implementation doesn't need to be complex. A two-tier cache with in-memory LRU and Redis covers most cases:
import { LRUCache } from "lru-cache";
const memoryCache = new LRUCache<string, unknown>({
max: 10_000,
ttl: 1000 * 60, // 1 minute default
});
type CacheTier = "immutable" | "short" | "volatile";
const TTL_MAP: Record<CacheTier, number> = {
immutable: 1000 * 60 * 60 * 24, // 24 hours in memory, permanent in Redis
short: 1000 * 60 * 5, // 5 minutes
volatile: 1000 * 15, // 15 seconds
};
async function cachedRpcCall<T>(
key: string,
tier: CacheTier,
fetcher: () => Promise<T>
): Promise<T> {
// Check memory first
const cached = memoryCache.get(key) as T | undefined;
if (cached !== undefined) return cached;
// Then Redis (if you have it)
// const redisCached = await redis.get(key);
// if (redisCached) { ... }
const result = await fetcher();
memoryCache.set(key, result, { ttl: TTL_MAP[tier] });
return result;
}
// Usage:
const receipt = await cachedRpcCall(
`receipt:${txHash}`,
"immutable",
() => client.getTransactionReceipt({ hash: txHash })
);The counterintuitive lesson: the biggest performance win isn't caching RPC responses. It's avoiding RPC calls entirely. Every time you're about to call getBlock, ask yourself: do I actually need data from the chain right now, or can I derive it from data I already have? Can I listen for events via WebSocket instead of polling? Can I batch multiple reads into a single multicall?
TypeScript and Contract ABIs: The Right Way#
Viem's type system, powered by ABIType, provides full end-to-end type inference from your contract ABI to your TypeScript code. But only if you set it up correctly.
The wrong way:
// No type inference — args is unknown[], return is unknown
const result = await client.readContract({
address: "0x...",
abi: JSON.parse(abiString), // parsed at runtime = no type info
functionName: "balanceOf",
args: ["0x..."],
});The right way:
// Define ABI as const for full type inference
const erc20Abi = [
{
name: "balanceOf",
type: "function",
stateMutability: "view",
inputs: [{ name: "account", type: "address" }],
outputs: [{ name: "balance", type: "uint256" }],
},
{
name: "transfer",
type: "function",
stateMutability: "nonpayable",
inputs: [
{ name: "to", type: "address" },
{ name: "amount", type: "uint256" },
],
outputs: [{ name: "success", type: "bool" }],
},
] as const;
// Now TypeScript knows:
// - functionName autocompletes to "balanceOf" | "transfer"
// - args for balanceOf is [address: `0x${string}`]
// - return type for balanceOf is bigint
const balance = await client.readContract({
address: "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48",
abi: erc20Abi,
functionName: "balanceOf",
args: ["0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045"],
});
// typeof balance = bigint -- fully typedThe as const assertion is what makes it work. Without it, TypeScript widens the ABI type to { name: string, type: string, ... }[] and all the inference machinery collapses. This is the single most common mistake I see in Web3 TypeScript codebases.
For larger projects, use @wagmi/cli to generate typed contract bindings directly from your Foundry or Hardhat project. It reads your compiled ABIs and produces TypeScript files with as const assertions already applied. No manual ABI copying, no type drift.
The Uncomfortable Truth#
Blockchain data is a distributed system problem masquerading as a database problem. The moment you treat it like "just another API," you start accumulating bugs that are invisible in development and intermittent in production.
The tooling has gotten dramatically better. Viem is a massive improvement over ethers.js for type safety and developer experience. Ponder and Envio have made custom indexing accessible. But the fundamental challenges — reorgs, rate limits, encoding, finality — are protocol-level. No library abstracts them away.
Build with the assumption that your RPC will lie to you, your blocks will reorganize, your numbers will overflow, and your cache will serve stale data. Then handle each case explicitly.
That's what production-grade on-chain data looks like.