Should your business be on blockchain?

It depends.

Blockchain has evolved from the concept of a platform for trading cryptocurrencies to one that exchanges all kinds of digital assets, from flowers and luxury goods to medical records and supply chain documents (smart contracts). When considering adopting a private blockchain model for your business transactions, you should ask yourself these four questions:

What are my transactions?

A first step is identifying the assets that you will be exchanging on a blockchain network. A transaction is simply an exchange, trade or transfer of an asset, a digital one in this case, between two or more parties. If your asset will change hands, then you will need to keep a chain of records documenting the activities of the exchange. This is the essence of a blockchain, to manage and secure digital transactions as part of an open, transparent and decentralized system of record. And don’t worry, your blockchain data is confidential and private.

A blockchain is a shared ledger of key-value hash chains, representing transactions, that is distributed amongst approved parties. It records data from member transactions and updates the ledger securely and efficiently, in a way that is both verifiable and permanent. This system of record is publicly visible to each approved member of the network, and when written to blockchain, each new transaction record is linked to the record before it, making the system auditable and immutable, the latter meaning it cannot be altered, as long as the encryption on the blockchain remains intact. Computational algorithms and approaches are deployed to guarantee that the transaction recording on the database is permanent, chronologically ordered, and available to all members of the network.

Will I leverage network effect?

A network effect is the effect, described in economics and business, that one user of a good or service has on the value of that good or service to others. When a network effect is present, the value of a good or service is dependent on the number of others using it.[1] A greater number of users increases the value to each.

The strength of a blockchain network is in its numbers, and this is where the concept of consensus comes in. In his lucid blogpost for Evergreen, @EricJorgenson explains the nuances between virality, network effects, and economies of scale. The network of users acts as a consensus mechanism. Having more users or nodes on the network generally means more distributed operators will review and agree on all addenda before the data is permanently committed to the blockchain. This means that each member is creating the same shared system of record simultaneously. Because the database is distributed, each party has access to the entire database and its complete history, making data manipulation very difficult, if not near impossible. This also eliminated the potential of having Single Points of Failure (SPOF) in any one part of the system, leading to a more robust system overall than a traditional centralized database could offer.

Can I pool resources?

For the purpose of pooling resources to achieve a common goal, business networks can come together to form a consortium that benefits from sharing reference data. A consortium blockchain platform is essentially a private blockchain model that will allow its members to retain privacy and control, while reducing their transaction speed and cost. This model might be appealing to parties spread across industrial, geographical or regulatory boundaries.

Each member of a consortium blockchain platform plays a different role in meeting their common goal, with a visionary who will be responsible for leading the pack. Hyperledger Fabric, for example, obviates the need for having a visionary to build the code, therefore a potential barrier to easy adoption.

What is the business use case?

Beyond building a minimum viable product that will be deployed on the blockchain, what is the overarching vision?

  1. Faster payment settlements – no reliance on centralized servers to do all of the work.
  2. Increased security – virtually impossible to reproduce (or tamper with) the entire blockchain, and to get member nodes to accept the hijacked version.
  3. More open and transparent – with permission, members are able to view the entire history of transactions, increasing auditability and general trust amongst participants.
  4. Lower cost – expensive infrastructure of transaction processing, reviews and approvals by intermediaries is eliminated.

In summary, businesses today should be on blockchain, but not all businesses qualify. Blockchains are well-suited for certain functions, while for others, they are not.



[1] – Network effect (n.d). In Wikipedia. Retrieved November 5, 2017, from


Curated list of blockchain services and exchanges – Imbaniac, Github


Rethinking commodity hardware for Hadoop

Traditionally, the idea of deploying Hadoop on commodity hardware was genius. With this option of low cost infrastructure, Hadoop was in fact was designed to be the RAID of compute farms – made up of homogeneous, generally cheap and easily replaceable servers. This model does work, but when big data starts to really scale, and I mean really scale, the terms ‘commodity’ and ‘cheap’ start to go from hand-in-hand to tongue-in-cheek. In order words, it would be an oversight and in poor taste to make long term goals for these two.

Get to your point already?

Infrastructure solutions come as servers with internal storage. Stripped down to the basics, they perform two functions: compute and storage. The issue with this is the reason behind the term ‘Big Data.’ The data will grow. As this growth occurs (the technical term is scale), more and more storage would be needed to house the data. You could store this data on servers of course, but if your need is really to store, then you probably don’t need the compute resources. And that’s where the challenge begins. Because traditional commodity servers have their storage and compute resources are joined at the hip, this is a quick way to under-utilize the resources, which is not cost effective.

So hey, why don’t you use external storage?

Well, great question. That’s definitely an option but there’s a  whole “movement” about moving the analytics to the data, which we could pick apart. Keeping the storage local means faster jobs. The moment you move the data across a network to external storage, you are susceptible to issues that accompany network and storage bandwidth – the good, the bad and it could get real ugly (e.g. complexity, latency, loss of features, loss of data governance, more infrastructure). So ideally, you want your data to be local – on local disks, which is also where your programs run.

Having said that, at the enterprise level, data resides on SANs – Storage Area Networks, and must be moved to compute nodes for processing. To optimize infrastructure and reduce data bottlenecks, the nodes should be on the same cluster.

So what else is there to Hadoop and commodity hardware?

High availability is essential to Hadoop, which is simply a mechanism to avoid having a single point of failure (SPOF) in the overall system. Thus to incorporate redundancy, the recommendation is to have more than one copy, and with Hadoop, specifically three copies of the data. And just as you see through the good intentions, you also see the implications of scale. This single move of replicating the data in three would result in replicating local disks, as well as scaling server and disk together. Again, the demon of under-utilization rears its head, causing the physical footprint of the datacenter to multiply. What a mess!

One popular use of Hadoop is as a data warehouse. If you’re considering doing this, i.e modernizing your existing/traditional data warehose solution, also be mindful of the impact of scaling on your datacenter and the associated costs which are not limited to:

  • Personnel: administrators for deployment and day to day operations
  • Network: bottlenecks, bandwidth and (re)solutions
  • Workloads: nature of jobs, needs – data streaming, or at data at rest
  • Software licensing (operating systems, applications) per node/cluster