ETH Exchange ETH Exchange
Ctrl+D ETH Exchange
Home > FTX > Info

The beginning and end of the "crash" event of the Ethereum Medalla testnet



Translator's Note:

Users running Prysm client please upgrade to Alpha.23 version as soon as possible:

This issue is an update outside the wnie2 plan, and will review and analyze the episodes that occurred on the Eth2 Medalla test network over the weekend.

We launched Medalla almost two weeks ago, August 4th, as a large, public, multi-client testnet running the Eth2 mainnet spec. For the introduction of the Medalla testnet, please refer to the previous issue.

The testnet ran smoothly for 10 days, even though the validator participation rate was lower than we expected (70%-80% of validators stay online for a long time). But it doesn't hurt, the testnet is more than capable of handling it.

However, on Friday evening, I watched the validator participation rate drop off a cliff in the dashboard. Within a few minutes, active-ethexc validators dropped from 22,000 to around 5,000, and about 80% of the validators in the network disappeared.

Therefore, this article will review the incident, including its aftermath and next steps.

STEPN will build Realm 3 on Ethereum and call it APE Realm: According to official news, Move to Earn application STEPN issued an open letter stating that STEPN will build Realm 3 on Ethereum and call it APE Realm, and APE Realm will Turn on casting after a few hours.

Through APE Realm, STEPN will integrate brand elements and new communities to explore the personalized possibilities of STEPN. At the same time, it is in line with STEPN's mission, that is, to be a bridge between Web2 and Web3, and to bring new billion-level users to Web3. [2022/7/7 1:57:52]

We discovered that every validator running the Prysm client in the network suddenly disappeared. Since Prysm is the most used client, the consequences can be imagined.

The Prysmatic team opened a documented report on the incident and continues to update it with details of the incident and the team's response. Here are some highlights with my notes.

The cause of the event is a problem with clock sync. The Prysm client is configured to use Cloudflare's Roughtime to calculate time. The reason for this is not very clear (in my opinion), but it's clear that Roughtime shifted time four hours into the future, and it lasted for more than an hour. Prysm client validators suddenly found their time was four hours faster and continued to generate blocks and proofs for a blockchain that didn't exist yet.

There are 198,636 unconfirmed transactions on Ethereum: Golden Finance News, according to OKLink data, there are 198,636 unconfirmed transactions on Ethereum, the current network computing power is 646.85TH/s, the network difficulty is 8.63P, and the current currency holding address is 62,371,109 , a year-on-year increase of 114,994, the 24h on-chain transaction volume is 3,738,224ETH, and the current average block generation time is 12s. [2021/9/9 23:10:56]

On its own, not enough to be catastrophic. Even with many blocks lost and facing a large amount of proofs from the future, the remaining clients will still be able to build on the original chain. Gradually, as the Prysm nodes' clocks adjusted back, they started coming back into the network, and validator participation started picking up. The network appears to be returning to normal.

But a few hours later, the situation took a sharp turn for the worse.

Four hours after the initial time, two more things happened. First, proofs generated by all Prysm clients in the future start to have validity. Second, Prysm nodes rejoining the network started disappearing again because the slash protection mechanism was triggered to prevent them from generating any contradicting proofs.

Those two things happened at the same time, throwing the network into turmoil. The remaining clients were still struggling to process the information they received, and the beacon chain became a branching jungle. (Raul from the Prysmatic team told me that a bug in Prysm's first fix made things worse)

Ethereum anonymous solution ZeroPool launched an early beta version: Ethereum anonymous solution ZeroPool launched an early beta version, but its developer Igor Gulamov reminded that this is a very early version, many parts have not yet been implemented, and the implemented part has not yet been completed , but it can still be run, and developers can test the contract or view the demo video. The team said that ZeroPool is an advanced privacy solution, better than other currency mixing tools, users can use it to hide the amount, source of funds and destination address from any external observer, store or execute ETH and any other ERC-20 tokens exchange. ZeroPool will consist of two parts, a smart contract that integrates zero-knowledge proof (ZKP) and unspent transaction output (UTXO), and a Chrome browser extension wallet that uses public-private key pairs generated by BabyJubJub elliptic curves. The project first originated in the ETH Boston hackathon and received a $25,000 grant from ConsenSys. [2020/3/1]

For a period of time, the information in the network is still under control. But over the next 24 hours or so, the memory and CPU required to navigate the increasingly complex and chaotic forks became overwhelming. I saw a Lighthouse client using 30GB of memory (about 100 times what it normally would), and for the Teku client, even with a 12GB Java memory heap and maxing out the processor, I was having trouble.

Dynamics | Ethereum Muir Glacier hard fork is expected to be carried out on January 2: According to the latest data from the Ethernodes website, according to the average block time, the Muir Glacier hard fork is expected to be carried out at 4:04 pm on January 2, 2020, about 66 % of nodes are ready to make changes; 34% of nodes are not. Most exchanges, infrastructure providers and mining pools are not yet ready for this upgrade. [2019/12/30]

Note that this all happens on weekends. Thanks to all the customer teams who are fighting on the front line. In order for the nodes to cope with the chaotic network, they need to constantly optimize memory and efficiency.

So far, the network is gradually recovering. User reports have been mixed, but the newer versions of Prysm and Lighthouse just happened to be able to find the correct chain header-ethexc and continue building the beacon chain. Eth2Stats currently shows some nodes at the chain head or nearby Lighthouse, Prysm and Teku nodes. We will continue to optimize Teku to reduce the resources it needs to sync.

One thing to be clear is that there is no consensus failure between clients, that is to say, when the network is restored, all clients can reach a consensus on the state of the chain head, which means that the beacon chain will not fundamentally fail, nor will it fail. Any hard forks are required.

Voice | Co-founder of Ethereum: Digital currency is not on the verge of collapse: According to CNBC, Joe Lubin, co-founder of Ethereum, said in an interview that the ecosystem of digital currency has never been stronger, and digital currency is not on the verge of collapse edge. [2018/10/22]

We will spend more time to fully reflect and summarize this episode, the following are some of my personal opinions.

High dependence on third-party time services is a fatal point for the network. As it happens, Alex Vlasov of the ConsenSys TX/RX research team has previously written a detailed article explaining time synchronization and its importance in the Ethereum 2.0 network. His work is progressing rapidly, perhaps this is also an opportunity for everyone to pay attention to this aspect. Here are his related articles and posts.

Ideally, we would have four or more independent clients, with each client node accounting for no more than 30% of the network. That way, even if one client had a problem, it wouldn't have enough impact to get our attention.

Even if we can't achieve this ideal, reducing the extreme usage by individual clients will make the network more robust. Assuming only 50% of the validators go offline this time instead of 80%, the network will recover more easily. This is because when there is a problem with the client, it will affect the network's block generation, proof packaging, broadcast efficiency, point-to-point communication, and synchronization, and these factors will also have a knock-on effect on the remaining validators.

Some stakers are able to switch signing keys to other clients' hot backup nodes. This certainly makes for a great security network, although one needs to be careful to avoid slashing: new validators may not know anything about the voting history of existing validators, and thus may vote contrary.

In the future, once we finalize the new API, it should be possible to implement the ability to switch validator clients between different beacon nodes, not just keys. For example, a Prysm validator can easily detach from a Prysm beacon node and reconnect to a Teku beacon node. This can solve the slashing problem mentioned above.

Participation in Eth2 is not a "one and done" thing at the moment. Stakers need to maintain a certain amount of attention, wandering between forums, providing feedback to developers and being able to update the client in a short time. I'm very supportive of people running their own personal validators, but only if they are aware of their responsibilities.

Why do things always go wrong on Friday evenings?

Even at this time, the response from the Prysmatic team has been amazing. See the team's incident report for details. What I say below is not meant to cast a negative light on the Prysmatic team, who have done a really good job, but to provide experience for the Teku team when faced with a similar situation.

When so many users lose assets (even if it's just a test coin), and the network is under high pressure, it's natural to want to react quickly, but haste can sometimes be wasteful.

Two things were avoidable in this incident. First, there was a bug in the initial fix release Alpha.21 that required users to rollback after 17 hours.

According to Raul of the Prysmatic team, this flaw was responsible for the ensuing network chaos. Second, the team inadvertently deleted its database of slash-proof records of 1024 validators while handling the situation, resulting in the majority of validators being slashed.

Similar situations can happen with any client. So even under high pressure, all of us, whether developers or users, must deal with it calmly and not blindly pursue speed. So when we tried to restore the network, we followed a slow and careful approach.

In the end, this episode was actually necessary. What's the point of a testnet if nothing is tested? It is obviously unrealistic to run smoothly all the time.

This time is a great test! This is perhaps the worst type of shock a network can suffer, and we probably wouldn't be able to design such a test if we were to design it ourselves. Exposing the testnet to this level of impact is exactly what we need to harden the client.

Last week The Block quoted my statement in an article:

In the email, PegaSys engineer Ben Edgington wrote that Medalla "is the first testnet with the scale and configuration of the mainnet."

"This is the first large-scale trial, and before that it was just a specification on the screen, or a toy network. There are many aspects of the peer-to-peer network that need to be tested and optimized. So far, everything is working normally, but before we can make sure that there is no error Before, it took more time, more scale and more pressure on the network."

To be honest, I was really looking forward to what to come.

All client teams are currently working on hardening the client to handle extreme network situations. Not a big problem, we should be able to get Medalla back to normal in the next few days, it may affect the balance of all validators, and some validators will face slashing.

If after that, even though the network is functioning normally, the validator participation rate does not pick up, then we may consider re-deploying the deposit contract from scratch (recreation may also be a good option). But this is only an option at this stage.

Long live the Medalla!

Original link:

Source: What's New in Eth2

By Ben Edgington


DeFi protocols after the dust settles: who will 'govern them'?

No one can replicate community engagement, great product, integrations and trust.If you have been active-ethexc in blockchain circles during the ICO mania, IEO hype, or STO mania.

8.23 evening market: how will the market go after an oversold rebound.

The article is contributed by Biquan Beiming, the columnist of Jinse Finance and Economics, and his remarks only represent his personal views.

He earned 194 times from playing Huobi Futures in 6 days, and he would lose all the money in the currency circle

Playing perpetual contracts on Huobi for 6 days, earning 19467.16% of the total income - 194 times!This may seem like an exaggerated story title. However, in the recent Huobi Perpetual Contract Masters.

The beginning and end of the "crash" event of the Ethereum Medalla testnet

Translator's Note: Users running Prysm client please upgrade to Alpha.23 version as soon as possible: issue is an update outside the wnie2 plan.

China Banking and Insurance Regulatory Commission Chen Weigang: Blockchain application has ten development trends

On August 17, the 2020 Blockchain Ecological Cooperation Conference was held in Shanghai. Chen Weigang.

King, chief analyst of Huobi Institute: Bitcoin has started a historic bull market after halving.

BTC effectively broke through the 12,000 integer mark. Throughout the historical market, every time Bitcoin is halved.

Golden Trend丨Golden Cross again BTC will start again this time?

The picture above shows the long-term weekly trend of BTC from 2010 to 2020. The two key moving averages in the figure are the weekly MA20 and MA50 moving averages.