The Challenges of Building an Indexer for Bitcoin Ordinals

Bitcoin ordinals became one of the hottest narratives in Web3 in the past year, with over 65M inscriptions to date, and played a major role in revitalizing Bitcoin culture. But getting Ordinals data from the Bitcoin blockchain isn’t easy. Here are some of the challenges we encountered building our own ordinals indexer Ordhook and the lessons we learned along the way.

Story details

Type

Deep dive

Topic(s)

‘

Bitcoin

’

‘

Engineering

’

‘

Product

’

Published

April 25, 2024

Author(s)

Ludo Galabru

Contents

Hiro is founded on several company values, one of which is “we make bold bets.” In the spirit of that bet, we've developed a blockchain, created a new programming language, and our developer tools are years ahead of those found on other chains in many respects, despite being such a lean team. Hiro is full of exceptional talent, constantly inspiring one another.

Back in February 2023, when the Ordinals Theory began gaining traction, we decided to explore the protocol and work towards improving this space, from a developer tooling point of view.

In this article, we'd like to delve into the specifics of Hiro’s indexer <code-rich-text>ordhook<code-rich-text>, explain its purpose, and how it differs from <code-rich-text>ord<code-rich-text>.

A Primer on How Bitcoin Works

Let's start with a recap of how Bitcoin operates. The Bitcoin blockchain is built on the Unspent Transaction Output (UTXO) model, as opposed to the account model. In the UTXO model, each transaction output remains unspent (UTXO) until it is used in a subsequent transaction. In other words, a user’s total balance is the sum of the satoshis associated with all of the outputs held by a particular address, unlike the account model which often update a “merkle-ized” key-value store.

In a UTXO model, every transaction is treated as a node, with UTXOs serving as links that form a transaction chain. This chain is what <code-rich-text>ord<code-rich-text> and <code-rich-text>ordhook<code-rich-text> essentially track and analyze—the path each satoshi takes from its creation to its current location.

Ordinals indexers essentially track the path each satoshi takes from the point of its creation to the present.

This flow of satoshis begins when a miner mines a new block. As per Bitcoin’s consensus model, the first transaction they insert into the new block involves creating a quantity of satoshis, determined by the Bitcoin protocol and halved every four years.

With the Ordinals protocol, each minted satoshi gets a unique ID—what we refer to as ordinal numbers.

When a transaction follows a specific format (specified by the Inscription protocol), these satoshis ID end up being inscribed and associated with some content (text, an image, mp4, sound, HTML page, BRC-20 operation, etc.). Satoshis can be inscribed any number of times throughout their history.

Building With Ord

Our initial experience with ordinals in February 2023 revealed the significant disk write stress the <code-rich-text>ord<code-rich-text> indexing causes. <code-rich-text>ord<code-rich-text> is the canonical implementation of the Ordinal Theory, and it traces satoshis (bundled in ranges) as they move through transactions, noting how they fragment and recombine.

In short, <code-rich-text>ord<code-rich-text> keeps track of the satoshi range fragmentation. We refer to this approach as "forward carrying": every time there's a new block, for each transaction in the block, <code-rich-text>ord<code-rich-text> prunes the spent UTXOs and updates the new ones with the updated ranges.

After spending more time investigating the codebase and attempting to optimize things, we discovered that this approach was incompatible with the way we build our indexers.

We are a DevTool-focused company, and we’re not only indexing data: we are also helping developers building and maintaining their own indexes, meaning we want re-orgs to be cascaded by downstream systems.

Bitcoin can fork, and it does so regularly. When forks occur, we want to easily recompute the new chain state. As a data indexer observing the chain, it's necessary to:

Undo the state computed when receiving blocks from the non-canonical fork
Apply the new block state to the database.

With the forward-carrying approach, being able to roll back and apply these changes would require keeping a snapshot of the last six global states. That isn’t efficient, and the rollback/apply logic is cumbersome.

Another limitation we encountered with ordinals was provability, which we believe is crucial in Web3. Ordinal theory is a very expensive protocol to compute, and the cost of proving that an inscription is inscribing a given satoshi (identified by its ordinal number) is high. In the context of the satoshi range carrying method (implemented by <code-rich-text>ord<code-rich-text>), it's practically impossible—the amount of data to track is simply too large.

Pioneering a Different Approach With Ordhook

At Hiro, we specialize in developer tooling, and we believe we can help developers producing consistent blockchain state indexes, without the constant headache of re-implementing fork detection and reconciliation every time the blockchain experiences a reorg. To that end, we created Chainhook, a tool that we describe as a “transaction indexing engine.” Chainhook is open-source and available in three formats:

embeddable in your Indexer as an SDK (Rust, JavaScript)
sidecar running with your indexer
available on our cloud platform

In all three cases, the mechanism is similar. A specific indexer registers predicates (which specify what information you are indexing) and receives notifications (in-process data or HTTP payloads, depending on the chosen method) with rollback/apply instructions, including the data to process.

The Ordinal theory seemed like a perfect fit for battle testing Chainhook and using it within our own product, the Ordinals API.

Tracking Sats Through Backward Traversals

With Chainhook as a starting place, we began working on a new approach to indexing ordinals. The first step of the process was understanding the Ordinal Theory specification and building an intuition for the flow of satoshis from transaction to transaction. The second step was to understand how this flow could be implemented in a re-org resistant fashion, without having to snapshot the state every time a new block is mined.

With Ordhook, we track the transfers of billions of satoshis.

Compared to <code-rich-text>ord<code-rich-text>'s forward-carrying model, <code-rich-text>ordhook<code-rich-text> adopts an opposite approach, which we like to call "backward traversals." When an inscription is detected, we look for the transaction parent, recursively, until we find the coinbase that minted the inscribed satoshi.

Every single inscribed satoshi is then tracked, and their location updated if transferred.

Keep in mind that in this process, we are tracking the transfers of billions of satoshis. Some satoshis have literally changed hands 100,000 times before being inscribed, so we need to traverse these enormous chains of transfers where simple off-by-one errors can lead to very inconsistent results.

Suffice to say that reproducing the unspecified logic of ordinals theory was complicated. The only information that <code-rich-text>ord<code-rich-text> can provide is the final ordinal number, without explaining the calculation details.

Ordhook Performance

Maintaining an alternative indexer isn't easy, especially in a protocol where code is law and new features are added at a rapid pace. <code-rich-text>ord<code-rich-text> is a moving target and keeping up with it is challenging.

The protocol is frequently updated with new features and significant refactoring, which can sometimes introduce consensus-breaking bugs. Building an indexer for an evolving protocol is like building a plane while it's in flight.

Despite those challenges, in the past year, we've made remarkable progress on ordhook’s performance. In the first version of our backward traversal, indexing a block with 50 inscriptions could take up to 5 minutes, as we would sequentially query bitcoind through RPC for retrieving every single parent up to the root coinbase transaction, for each inscription detected, one after the other.

With time, our level of sophistication increased.

Our latest version takes 2-3 seconds to index a block with 2,000 inscriptions. It leverages multiple level of caching, thread pools, and a lot of pre-computing.

A lot of work doesn’t have to be sequential. In other words, we don’t have to wait for processing block N for doing some work on block N+1 and N+2, which helps a lot speeding up indexing process.

Further, as the stress of our approach is on reads, and reads are generally easier to scale than writes, we have many options to improve performance—and we’ll need to keep improving performance because the number of inscriptions is continuously increasing.

More Indexers = Better Resilience

When viewing the matter from the perspective of the entire ecosystem, it becomes apparent that there are significant advantages associated with having these two fundamentally different implementations. Despite their radically different approaches, <code-rich-text>ord<code-rich-text> and <code-rich-text>ordhook<code-rich-text> are capable of producing the same state, and this is beneficial for the ordinals community.

This diversity in implementation strategies provides a form of redundancy and adaptive capacity within the system, which can potentially lead to greater resilience and flexibility in the face of unexpected changes or challenges.

Start Indexing Ordinals

If you’d like to get started with an ordinals indexer, we encourage you to try Ordhook. We designed and optimized it specifically for developers. We know that devs want quick feedback loops, and Ordhook is engineered to help you get started in just a few minutes.

Ordhook automatically retrieves and loads historical snapshots of ordinals data from the Hiro Archive, so you can start interacting with data swiftly.

To get started, check out Ordhook documentation, and if you want to get in touch with us, reach out to us on the #chainhook channel in the Hiro Developer Tools section on Discord.

The Challenges of Building an Indexer for Bitcoin Ordinals

A Primer on How Bitcoin Works

Building With Ord

Pioneering a Different Approach With Ordhook

Tracking Sats Through Backward Traversals

Ordhook Performance

More Indexers = Better Resilience

Start Indexing Ordinals

Related stories

How to Sync a Bitcoin Node Ahead of the Bitcoin++ Hackathon

Bitcoin Needs Layers: Reflecting on the Bitcoin Halving

How Hermetica Uses Chainhook to Track Bitcoin Deposits