blockchain

Bitcoin Basics: Blockchain, Hashing & Mining….oh my!

We continue to dust off the archives with this piece from Daz Bea, first published on Medium in July 2021. Some minor edits have been made to aid in clarity…enjoy.

Perhaps you are new to bitcoin, or perhaps you have been down the rabbit hole for a while but never really understood exactly what goes on behind the scenes, on the blockchain. I hope to unveil the mystery behind blockchain, hashing, bitcoin mining and distributed ledgers all in one simple article. Strap yourself in, we have got a lot to get through.

SHA256 Hashing

The bitcoin network uses the SHA256 algorithm to conduct hashing functions. Confused already? Don’t worry, this can be very complex, and way above my pay-grade, but I’ll explain the basics as best I can.

Think of hashing as creating a digital fingerprint of a set of data. Data goes into the algorithm (the input), the algorithm works its magic and spits out a series of alphanumeric characters (the output). What is really cool about hashing is that it doesn’t matter how much data you run through the algorithm, the output is always the same length, 64 hexadecimal characters for the SHA256 algorithm.

Figure 1 shows the SHA256 hash function (we will refer to this as hashing from here) of the name “Daz”. “Daz” is the input and 3445748836aa04e53fd5de81c41c4a370f0cf52004659abf87920abc0da1bbaf is the output…
Simple, right?

figure 1
Figure 1. Hash of the name “Daz”

Now I’ll change the capital “D” to a lower case “d” (Figure 2) to demonstrate that one simple, small change to the input, results in a completely new output. Input=daz, output = ae5e9de1ed5510933a86705cb253b3cbd0b0891e70217c7a64603869aeaac093. As you can see from our new output, when compared to the original, the characters are completely different. There is no discernible pattern between the two.

screen shot 2023 07 15 at 10.22.51 am
Figure 2. Hash of the name “daz” with a lower case “d”

As mentioned previously, the resultant hash is 64 characters long. This is always the case, no matter what size the data is that we place into the algorithm. In Figure 3, I paste an entire Wikipedia article on Fender Stratocasters into the input, the output remains 64 characters long. This feature would remain true if you were to put the entire contents of the internet through the algorithm, 64 characters every time.

fender hash
Figure 3. Hash of a Wikipedia Article on Fender Stratocaster’s (source: https://en.wikipedia.org/wiki/Fender_Stratocaster)

A feature of the SHA256 hash function is that if we were to re-hash the exact same set of data, we would get the same hash of that data each and every time. This is something that is used frequently throughout the world to compare a number of large datasets to ensure their accuracy. Comparing hash-outputs is easier than comparing the entire contents character by character. This feature is what forms the basis of the bitcoin blockchain. Let’s dive into the next peice of that puzzle… Blocks.

Blocks and Mining

We can use the SHA256 hash function to start building blocks of data. For each block, we have the input (the set of data), and we add some distinguishing features to that data like a block number, as an example. And we run it through the hash function.

In figure 4, we can see we have some data fields such as block #, a nonce (we will talk about this soon), and at the moment our data field is blank. The Hash of these inclusive fields of data is: 0000f727854b50bb95c054b39c1fe5c92e5ebcfa4bcb5dc279f56aa96a365e5a

block 1 hash
Figure 4. Block Example 1

You will notice a distinguishing feature of this hash, notice that the 4 leading characters are all 0’s. This is no accident. You see, the output hash of the SHA256 algorithm is actually a very large number. In everyday life we are accustomed to using a number system called base10. Meaning our number system is a system based on the numbers from 0 through to 9.

There are several different number systems, particularly in computer science. Base2 is another common numbering system, you might be more familiar with the term Binary. Base 2 or Binary is a number system commonly used for the base structure of computer systems made up of the numbers 0 and 1.

The bitcoin SHA256 algorithms output is in Base16, also known as hexadecimal. Base 16 simply uses the base10 number system of 0 through to 9, and extends this number system with the letters a through to f in lower case to create an equivalent number system with values from 0 through to 15 illustrated in table 1.

figure2.26
Table 1: Base 16 (Hexadecimal) numbering system.

Why use base16? Simply, we use base 16 for the sake of brevity. A way of representing extremely large numbers while reducing the total amount of characters.

Now, let us introduce 2 new terms related to bitcoin mining. Difficulty and Target. Difficulty and Target are ways that the bitcoin protocol controls how difficult it is to find blocks. I touch on this to some degree in one of my previous articles on the Difficulty Adjustment, you can read that here.

For now, simply understand the program we are using for this demonstration has a fixed “target” to make the output hash function below a certain target number which makes the nonce harder to find. For the bitcoin protocol, the target is adjusted every 2 weeks in order to regulate an average block rate release of 1 x block every 10mins, as more computational power is thrown at the Bitcoin network, the blocks are solved more quickly, the protocol looks backward to the previous 2016 blocks and adjusts the difficulty of mining blocks by lowering the target value, making the nonce harder to find.

We won’t get too further bogged down into this in this article, we will save it for another time. Today we will focus more on the nonce and how to build a blockchain.

Back to our example in Figure 4. Now let us add some data to this block. I will add the phrase “Hello World”. Once I add this data, you will notice that the background has turned red (Figure 5), the program is not happy with our block now. Recall from Figure 4 that the hash of that set of data started with 4 x leading 0’s. This particular program I am running requires that the hash must always be below a target value with the output hash starting with 4 leading 0’s, that number in its full base16 form being: 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

hello world block 1
Figure 5. A change in the data has invalidated the block.

When I changed the data by adding “Hello World” the SHA256 algorithm provided a new hash, and as we can see, my new hash does not start with 4 leading 0’s, the output hash is above our target value and is thus invalid.

There is a fundamental truth to SHA256 algorithms, in that there does exist a set of data, that once included in combination with my other data within the block, that would result in an output hash below my target number. There may indeed be more than one answer, in fact there is likely to be many. The difficulty (pun intended) however, is what data could it be to satisfy the problem?

This is where the Nonce comes into play. If we break down my data-set within the block, we have the block number (1) and the Data (Hello World) and an additional field called the nonce (currently with a value of: 72608). What if there was a set of data I could include as the nonce value, that once added to the rest of the data, would result in a hash that satisfied my program’s requirement for the output hash to result in a number smaller than 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

The beautiful thing about crytopgraphy, is that this cannot be calculated manually. My only option is to try and arbitrarily guess what that data may be. I can change the nonce value to 1? Or change it to 2? or i can try 2456395697 or 45628496902074?…….. I might be here a while.

Or, we can use the power of my computer. We can use the computer’s ability to quickly process information to start guessing what data we could include in the “Nonce” field, that once run through the hash function, would result in an output hash below 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff.

If I hit “Mine” in my program, my PC will start guessing values for the nonce, there is no other way to compute this value aside from guessing and checking, guessing and checking, until we find a nonce value that achieves an output below our target value.

Once a solution is found, the program checks the result and the block turns green again. It’s happy. This process can take some time depending on how difficult it is to find a data set to match. This, in simple terms, is known as mining. We are “digging” through combinations of data to find a solution to a mathematical problem. We are trying to find a value of the nonce, that when coupled with our data, produces a predetermined requirement (in our case an output hash below the base16 numerical value of 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff). Figure 6 shows what this looks like for our example, this took around 3secs to complete on my desktop computer. The nonce was the number 24894.

hello world mined
Figure 6. Mining a new nonce to satisfy a hash function with 4 x leading zero’s.

Congratulations! We have mined our first block.

Blockchain

A blockchain is…….wait for it…….a chain of blocks. Its a chain of blocks, just like the first block we mined above, but with some differences.

A set of data is entered into block 1, we mine that block to discover the nonce to satisfy the difficulty requirement (an output hash below the target value). This produces a hash and our block is complete. We then create a new block. We will include a new data field in this new block. A data field whereby we include the output hash of the proceeding block as input data into the new block.

We can see in Figure 7, that we have a chain of 2 blocks. The first block is a copy of our above examples, but we have a new field labelled “Prev” containing new data. Block 1 is our genesis block, it contains arbitrary information in the prev field. The block is mined and our hash of that data is obtained.

In block 2, we include the output hash from block 1 as input data to our new block. We place that data in the “Prev” field. We add the new data we want to be included in that second block and we mine that block. Both of our blocks are happy.

blockchain x2 blocks
Figure 7. Building a Blockchain. The output hash of block 1 is added to the “Prev” field of Block 2.

Now let’s go back and change something in block 1. I will add a full stop “.” to the data. We can see now in Figure 8 that the entire blockchain is not happy. Both blocks are red. I have changed the data and broken the chain. Block 1’s hash is no longer below the target value. Adding the full stop “.” to the data, has resulted in a new hash with a value above our target value of 0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff, and thus is not compatible with the program’s requirements. But the output of block 1 is also an input to block 2, thus also invalidating the hash of block 2. We have broken the chain.

blockchain add full stop
Figure 8. A change in Block 1 has broken the chain.

Re-mining block 1 has resolved the program for block 1, but again, the data has been changed in block 2 due to the change in the new output hash from block 1, it is still not happy (Figure 9).

add full stop mine 1
Figure 9. We re-mined block 1, but block 2 still is not happy.

We must now also re-mine block 2 to ensure it complies with the difficulty requirement. Figure 10 shows that we again have 2 x happy blocks once we have redone the work for the 2 blocks. “Work” is a term to describe the computational power we exerted through the process of mining. You may have heard the term Proof of Work before which relates to this concept of securing the bitcoin blockchain through the process of energy expenditure…the work.

add full stop mine 2
Figure 10. Re-mining blocks 1 and then 2. The program is happy once more.

Note the need to mine the blocks in order. Re-mining block 2 before we satisfied the work needed for block 1 would result in us having to re-mine block 2 yet again.

We now have the foundation for our blockchain, a 3rd then 4th block can be added, mining each block with the new input data, with the output hash of block 2 leading to the input of block 3, and the output of block 3 leading to the input of block 4 and so on and so forth.

If our blockchain is 4 blocks long now, and we again try and change data in block 1. It will invalidate the entire chain. We would then need to re-mine block 1, then block 2, then block 3…..you see where I’m going with this?

We now have a good understanding of a blockchain and the pieces that need to go together to create the chain. Now let us expand upon this thinking and look at a distributed blockchain.

Distributed Blockchain and Consensus

One of the beautiful things about the bitcoin blockchain is the distributed ledger. We will get to the “ledger” side of things shortly, for now, let us simply look at a distributed blockchain to understand the concept.

Let’s say I had a blockchain, that contained a series of blocks, each block containing some data. This data has been hashed and chained together as we have seen above to form a blockchain. I keep a copy of this blockchain on my computer. What if there was a need to compare notes with someone else? What if the integrity of this data was really important to me and I wanted to ensure it hadn’t been tampered with? What if I wanted to be sure that the data I had, was a true and correct version of that series of data? It would be handy if there was an exact copy of this blockchain for me to compare my version against…. Right?

This is where the beauty of the distributed blockchain comes into play. What if I gave someone else a copy of my blockchain so that I could go and compare notes at any time. What if I wrote a program that would do that for me automatically. It could continuously compare my set of data to my friends and flag any discrepancies.

To give a scaled-down example. Say I had a blockchain, only 3 blocks long containing some really important data I was working on. I could secure this data in a distributed blockchain by keeping a copy at home (Figure 11 — Peer A) and giving someone else a copy of it or simply keep a copy on my computer at the office (Figure 11 — Peer B).

distributed 1.
Figure 11. Distributed Blockchain Example — Peer A is home and Peer B is the office.

If I left my computer on and unlocked, and some nefarious actor (they’re everywhere) comes along and changes the data in block 2 (can you spot the differences?). They could re-mine each block from 2 through 4 so my copy of the blockchain looks OK in terms of the chain and the hashing functions for each block. However, when I come back to my distributed blockchain program, it flags that my version of truth differs from that on my office computer? A quick comparison shows that indeed the hashing has changed all the way back from block 2 to block 4. (Figure 12). Both chains look ok on their own, they are both green, but my computer program that compares each distributed version, flagged the discrepancy between the hashes and alerted me that they do not match.

distributed 2
Figure 12. Data has been changed in Block 2 of Peer A’s blockchain, the blocks were re-mined. The hashes from block 2 are now different.

But which version is correct? It is impossible to tell unless I knew which specific version the attacker had changed.

It would stand to reason then that it would be handy if I had yet another version to compare to. I’ll keep a copy at home (Figure 13a — Peer A), a copy at the office (Figure 13b — Peer B) and another copy at my mother-in-law’s house (Figure 13c — Peer C).

consensus peer a
Figure 13a — Peer A’s version of the blockchain

consensus peer b
Figure 13b — Peer B’s version of the blockchain

consensus peer c
Figure 13c — Peer C’s version of the blockchain

Now, the nefarious actor strikes again, they change the data on my home PC version again and re-mine all the blocks at home. But now I have a consensus mechanism by which to compare versions. I have a voting system. I have 3 copies of the truth. Peer A (Figure 14a) is telling me one thing and the other 2 (Figure 14b) are in consensus on a different version of truth.

no consensus peer a
Figure 14a — Peer A’s version (change made to block 2)

no consensus peer b&c
Figure 14b — Peer B and Peer C are in consensus.

I can put a level of trust into this consensus by assuming that the nefarious actor would not have been able to:

  1. Know the physical location of my blockchains
  2. Be able to break in, change the data and re-mine 2 of the 3 sets of this blockchain.

While the above scenario is possible, it is not very probable.

I can thus disregard my home version, backup this version using the versions from my office and mother-in-law’s version, and be safe in the fact I have recovered THE CORRECT version of truth. Thank god for mother-in-laws and blockchain technology.

Now imagine I had 10’s of thousands of computers (nodes), randomly distributed throughout the world, running this blockchain (that data is pretty special to me after all). That bad-actor would have to track down more than 50% of these nodes and change each version of them in order to cast doubt as to which version of the chain was the truth. This is highly improbable, if not impossible. This is exactly how the bitcoin blockchain works. A network of randomly distributed nodes, run by everyday people, on software on their computers or small cheap dedicated hardware devices, storing a version of truth and keeping each other honest. This is what makes bitcoin decentralised and trust-less. There is no one party that controls the blockchain, it is run by the community of participants.

But Daz, what about bitcoin itself? What about the coins? What about the Ledger?

Bitcoins and the Ledger

From the knowledge we built about blockchain so far, understanding the ledger is as simple as being able to format the data-set we have been playing with. So far we have been playing with a text field. And as important as the information is that “Daz is pretty rad”, it turns out, no one else cares.

But what would be useful is if we utilised the “data field” to start recording something useful like transactions. Let’s imagine we now have a blockchain distributed among a lot of nodes, but we will just look at Peers A & B to get the idea, but in reality, there are thousands of nodes running the same blockchain. In our example, we have replaced the data free-text field with a series of transaction fields. Figure 15 shows blocks 4 & 5 from Peer A and Peer B and we can see that there is a series of transactional data now within each block.

ledger 1
Figure 15 — Peer A and Peer B with transactional data within a blockchain.

In block 4 we can see that an amount of $62.19 was sent from someone named Rick to someone named Isla. This was one transaction among 5 total transactions included in that block. The block of data is hashed and mined exactly as we have seen previously.

Our nefarious actor strikes again, his name is Sam, and he received $97.13 from Rick. Sam knows a bit of coding and decides he wants to steal some $$ from Rick, he decides to change his version of truth and changes the transaction amount to $97,000.13. He re-mines his version (Figure 16), but it is useless. The vast amount of nodes on the network see that this is a version of truth that is out of consensus with the majority, this version is rejected by the network. Nice try Sam.

sam vs rick
Figure 16. Sam tries to rip-off Rick by changing the transaction amount. This is rejected by the consensus network.

We can see a whole heap of transactions between parties, but how do we know that Rick had $97.13 to spend in the first place?

Coinbase — The Genesis Block

If you have been among the bitcoin community for a while, you have undoubtedly heard of the infamous genesis block. Satoshi Nakamoto, (Bitcoin’s infamous, pseudonymous creator), mined the first block which contained a block-reward of 50 Bitcoin. Block rewards form the coinbase for bitcoin, in other words, in order to spend coins, they must first be brought into circulation. The Bitcoin Genesis block was actually counted as block “0” and the first 50 bitcoin were actually non-spendable, but from block 1 onward they formed the coinbase.

Coins are introduced with each and every block on a tightly controlled release schedule. This release schedule is 1 block roughly every 10mins. Every 210,000 blocks, the block reward is halved. As of this writing in July 2021, the block-reward is 6.25BTC every ~10mins. There is currently $18.7million bitcoin in circulation, with the last 21millionth bitcoin estimated to be mined in the year 2140.

The block 1 established the initial coinbase, subsequent mining of further blocks expands on the coinbase through the block rewards. Looking at our example, I start with block number 1, this forms my coinbase.

If we look at our blockchain example (Figure 17) we can see that I rewarded myself $100 in our first block. Partly because I am a good bloke and partly because I happen to be the one that mined the first block. When we move to Block 2, I start to spend my coins.

The program will always check that my transaction outputs (the coins I spend) don’t exceed my balance in the coinbase. I include the transactional data in block 2 and I mine that block too, I am rewarded with more coins as the block reward for Block 2.

From here, the blockchain keeps building, keeping a distributed copy of the ledger among the nodes and these nodes enforce the rules determined by the program. Block 3 is included with more transactions as other users start transacting. Lucas is also mining, solving the difficulty puzzle for block 3 and he is rewarded the block-reward for that block. And on and on the chain grows.

coinbase 1
Figure 17. A coinbase is now added to the blockchain

As is consistent, if data is changed, even my one simple character, whether it be in the hash outputs themselves or the transactional data, if someone’s chain is out of consensus with the majority, it is simply rejected.

In terms of balance, if I try and spend more than my balance, the network will reject the transaction. In our earlier example of Sam trying to change how much Rick sent him, there is another layer of security built in called public and private key cryptography, which will be the subject of the next article. It is actually impossible for Sam to edit this entry unless he also has possession of Rick’s private key.

If I try and change the history, the network will reject the history by comparing the hashes of each block. The nodes are the source of truth and keep everyone honest.

The miners throw computational power at solving the cryptographic puzzles for each block, they are rewarded the block reward for their efforts. Mining is a necessary function of the bitcoin blockchain, without mining, there is no value and no security.

Miners must expend energy to solve the blocks, this is a really understated feature of Bitcoin’s security. Throughout this article we have spken at length about mining, and remining blocks where changes are made. What is really important to note is the vast amount of computational power thrown at the Bitcoin protocol each and every minute of every day. If this hugely vast network took 10mins to find the last block, how long would it take for a bad actor, acting alone to find an alternate nonce to mine a block on his own, in order to change the data within. How much energy would a nation-state have to capture in order to change the history of the Bitcoin blockchain? And, coupled with the earlier point about public and private keys safe-guarding the ledger, the only thing the bad-actor would be able to achieve is a double-spend of the coins they already have possession of, and they would have to have more computational power with the available energy to outcompete the entire rest of the network. This is virtually impossible.

This also helps explain why Bitcoin is the only digital asset worth giving your time to, it wins in terms of network effects alone. Compare the hash-rate of Bitcoin vs any other protocol and they pale in comparison.

For further examples of all of these features, we highly recommend reading our book B is for Bitcoin, available through Amazon. We take you through all this and much, much more, for an in-depth yet approachable introduction to everything you need to know about Bitcoin.

Conclusion

What we have covered here is a simplified version of exactly how the bitcoin network operates. Distributed Ledgers, hashing, blockchains, nodes and miners working in harmony to provide a completely trust-less, censorship-resistance, open-source, open-ledger, open-monetary-network. Nobody can change the ledger, nobody can stop transactions, nobody can reverse transactions and nobody can double-spend their coins. It is nothing short of brilliant and is completely revolutionising global finance.

The full history of transactions is available for anyone to interrogate and verify. Obviously, unlike our examples here today, the bitcoin blockchain does not display the identities of the people transacting, only their addresses. Addresses will be the subject of the next article, it is too large a topic to cover today.

In this article, I used the amazingly interactive platform built by Anders Brownworth who taught blockchain at MIT. Visit here, go and have a play and get a deeper understanding of the concepts we covered in this article, you will get a much deeper understanding if you play around with it all. Anders also has a very good video tutorial with this content also to help cement the learning.

In terms of bitcoin adoption, if you are reading this, you are already ahead of the pack in terms of bitcoin adoption. Institutional adoption is coming and coming fast, we still have time to front-run the big boys. From my articles, I am trying to help the Average Joe educate themselves on this technology so that we can benefit from it. We will benefit the most, the guys in the trenches, battling for wages. Start dollar-cost-averaging into bitcoin, buy a bit every day/every week. Treat bitcoin as your savings account and ride the wave up, it is the hardest money that has ever existed and it grows exponentially each day. (Not financial advice, education and entertainment purposes only)

Happy Stacking, thanks for reading

Daz Bea
Twitter @dazbea

Recommended Reading:
B is for Bitcoin
“Mastering Bitcoin”– Andreas M. Antonopoulos
Blockchain Demo – Anders Brownworth

Leave a comment

Your email address will not be published. Required fields are marked *

imageedit 3 4203740775
Co-founder, Chief Operations Officer