OKLink Embraces the Golden Period for Big Data on the Blockchain

原创 精选
In this article, we invited Mr. Xu Qian, a senior researcher in blockchain at OKLink, to share his thoughts on applying big data to the blockchain and discuss his latest views on the industry.

A bursting demand for big data on the blockchain has resulted from the combination of blockchain and metaverse, which has spurred more interactive content to be accepted by big data on blockchain firms.

In this article, we invited Mr. Xu Qian, a senior researcher in blockchain at OKLink, to share his thoughts on applying big data to the blockchain and discuss his latest views on the industry.

Established in 2013, OKLink is a leading global blockchain enterprise and one of the earliest blockchain companies in China. The company is dedicated to developing and commercializing blockchain technology and is now a large global blockchain technology and service provider whose same-named product OKLink provides accurate on-chain data analysis solutions.

The development of big data on blockchain: massive data and bursting demand

OKLink faces four crucial time points in developing big data on blockchain business.

In August 2019, OKLink began to present the business related to blockchain browsers of public chains, including block resolution, transaction resolution, and address details;

From April 2020, it has conducted a comprehensive business in on-chain data analytics, including rich addresses, computing power analyses, and entity tags;

In September 2020, it launched the prototype of On-chain Tianyan and On-chain Dashi, including on-chain monitoring, graph analysis, and smart contract resolution;

Starting from July 2021, OKLink has shifted from chain-wide data to business layer data and has gradually changed from blockchain infrastructure to on-chain data governance.

On the data volume front, OKLink has already supported total resolution on nodes of more than ten public chains. Right now, it has more than 100 TB of data in ES, Hbase, gallery, data warehouse, and other databases. With 16.7 billion transactions on-chain, it contains 1.5 billion address data and more than 100 million address tags.

There are four main characteristics of big data on blockchain that are as follows:

1. Huge amounts of data

In light of the data presented above, we can see that the business's data volume is exceptionally high in OKLink. The traditional approach of centralized storage and computing cannot handle such a large amount of data.

2. Multi-structured data

In the wake of the metaverse emergence, the combination of blockchain and metaverse has prompted the need for blockchain vendors to accept interactions involving photos, videos, and other documents beyond the traditional data types being handled by blockchain.

3. Rapid growth

In the wake of the recent expansion of the EVM public chains, led by Ethereum, the new chains are all speedy and allow for more data storage. This massive amount of data must be analyzed in due time due to the large user base, the multitude of devices, and the exponential growth of data in real-time, which all require comprehensive data reconstruction.

4. Low-value density

The value of an individual piece of data is small, but the value of enormous amounts of data is very high. You must have more in-depth data parsing, mining, and analysis capabilities if you wish to present the business information of the metaverse to the public. Despite this, the difficulty and cost of mining blockchain data remain incredibly high at present.

It seems that the on-chain business boomed last year around the world. Several leading companies have completed funding in the tens of millions of dollars, with valuations approaching one billion dollars. This shows that the whole business was noticed by VCs and that the demand is bursting.

Here is an example of tagging business, to illustrate the recent development of the big data industry.

To begin with, the tags have been classified into three categories:

primary tags by industries and fields;

secondary tags by specific names of units, institutional organizations, etc.;

tertiary tags by particular types of addresses.

In the late 2020s, the development of smart contracts has shown explosive growth. Numerous financial innovations are being developed on the blockchain, including banking and transaction-based businesses and financial derivatives.

For example, an address can be tracked to find profitable information, and the final analysis of this address is very beneficial even if it does not focus on a particular individual. As long as you follow it, you will be able to earn money, but as to who the individual is, you are no longer concerned.

Tags of this type are more commonly referred to as Smart Money. As a follower of Smart Money, you will also invest in the same thing, taking note of its buy-in and sell-out so that you will obtain a good return. It is sporadic to analyze Smart Money in the traditional market, and you need your own analysts or even a software program to do so. In contrast, public on-chain information can be extracted at substantially lower costs as long as you have the capability to store and analyze data.

Well-informed tags are also important. Suppose, for example, that a certain address recently purchased a certain asset, and it has since experienced a significant increase. When we observe a few such transactions, we can label the address as "well-informed," indicating that it was aware of the rise before the market. The on-chain behaviors suggest that the outcome has been profitable despite not knowing who it is. As opposed to entity tags, these tags are defined as behavioral ones.

Additionally, there is a type of tag known as "attribute tags," which are produced by analyzing the on-chain attributes of the address (smart contract code, creation time, creator, etc.) to describe the characteristics of the address itself, including hacking events. It is also pertinent to consider how to prevent and anticipate attacks in advance.

At that point, we will have to turn our attention to the smart contract source code. We observed that a few leading projects would open up their contract codes. Therefore, we may decompile these codes. In addition, some analysis will be conducted, such as identifying some high-risk functions or coding features, which can be classified as privileged functions or privileged address attributes by using smart contracts. It may be risky to have an attribute with a privileged function address. These complement the on-chain attributes of smart contracts, also known as attribute tags.

Next is the expansion of the model, based on the characteristics of the on-chain behaviors, and giving it to the designated input clerk for completion.

There are three basic ways of tagging currently in use. First, manual collection, such as the collection of dark web addresses; second, model expansion, which is based on the characteristics of on-chain behaviors to summarize the expansion method, give it to the designated input clerk, and then complete the dynamic expansion as per parsing on-chain data; third, artificial intelligence. It uses machine learning to dynamically create feature engineering, and then automatically look for potential tags through algorithms.

OkLink's Exploration: providing insight into on-chain data and securing the chain

OkLink has undertaken many explorations related to big data applications in these days.

The first product is On-chain Tianyan. The numbers in the above image represent the number of cryptocurrency cases in China. The darker the color, the greater the level of security. You can see the distribution of cases and the amount of money involved in the diagram.

Currently, global blockchain asset crime is severe, with fraud being the main form, followed by theft, pyramid schemes, and money laundering. Specifically, the amount involved in global blockchain asset crime in 2021 was as high as $14 billion, up 79% year-on-year; blockchain asset fraud cases caused losses of $7.8 billion, an increase of 82%; hacking theft cases caused losses of $3.2 billion, up 516% more than before; and DeFi crime losses exceeded $12 billion, six times more than the previous year.

Additionally, Chinese law enforcement agencies have encountered technical challenges in forensic analysis of blockchain asset crime cases.

First, blockchain assets are diverse, and large transaction volumes constantly occur, so the workload of capturing and analyzing them is considerable.

Second, there are many types of blockchain asset trading applications and software for the desktop, so it is difficult to forensically transfer and freeze the affected assets;

Third, the chain of transfer is complex, so the chain identity cannot be the same as the real identity;

Fourth, code quality in a blockchain project varies and has many vulnerabilities, which are often exploitable by hackers and malicious insiders. Fifth, seized blockchain assets cannot be properly disposed of and hosted.

Therefore, OKLink has developed a tool for on-chain asset tracking—On-chain Tianyan—using the above tagging solutions and the statistical, collection, and computing capabilities of the total data.

With On-chain Tianyan, the life-cycle behavior of a chain address and its characteristics are displayed in an all-around manner. Additionally, through address research and judgment, the flow and stream of digital assets can independently be tracked to determine their origin.

Secondary alignment of the untagged data may be performed using data mining and comparison to find the addresses of the cases concerned and the addresses of any linked individuals;

Big data visualization will provide a real-time picture of all digital assets in the country using a complete database.

With analysis and research experience gained in hundreds of cases and the models of sound security practices, OKLink can quickly identify the addresses of platforms, the flow of platform assets, and the hiding places of platform assets for common types of cases, such as gambling platforms, phishing websites, etc. At the same time, according to the found addresses of the crypto charging/withdrawing and the larger platform, it can grasp the number of people and money involved at a fast pace.

Following that, we will briefly describe the business of On-chain Dashi. By integrating OKLink's honeycomb architecture with the OLAP database for offline and real-time analysis, using pre-calculated solutions to meet the all-round multidimensional index output of on-chain data, On-chain Dashi offers the following features:

1) data monitoring, keeping track of data movements in real-time, anticipating market changes;

2) Using the combined toolkit, users can compile and analyze data from multiple sources, thereby building DIY indicators and dashboards;

3) Through navigation, search, and filtering, users can locate appropriate data indicators and accurately position time cycles;

4) By developing a customized dashboard, you can include all the commonly used indicators in the collection, making it easier for users to access and meet their specific needs.

As a result of the combination of blockchain and metaverse, there has been a growing demand for blockchain companies to include more interactive content on their platforms. Under this trend, organizations like OKLink may also make more innonvation and enjoy a golden time for big data on the blockchain.

责任编辑:庞桂玉 来源: 51CTO

2012-05-28 13:58:36


2011-08-10 16:45:55

Big Data

2012-10-18 10:15:01


2022-08-31 14:58:48

data lakescloud natibig data

2018-03-12 18:05:21


2011-08-18 14:23:52

Big Data

2011-10-28 08:47:39

IBMBig Datn数据分析

2012-02-20 09:27:00

IBM大数据Big Data

2013-05-21 10:05:55

伦敦奥运Big Data奥运大数据

2013-01-07 09:40:28


2012-05-30 13:44:45


2020-06-02 09:28:46


2021-04-26 10:13:13


2013-05-23 09:34:49

Big Data大数据

2012-05-31 10:14:23


2016-02-16 14:42:58


2012-06-11 17:56:33

2022-08-31 08:45:47


2017-05-31 14:25:14


2022-08-30 20:43:05