Data mining Algorand for participating accounts

Data mining Algorand for participating accounts

Urtho
Urtho

Blockchain validation on Algorand

Algorand utilizes unique validation solution that allows for unlimited scaling while maintaining sub 4 second block finality without sacrificing security.

Validators on Algorand network are not rewarded (2023) for consensus participation but at the same time the cost of running a validators node is below $20 monthly and the minimum required stake is just a non-zero balance on the account (less than $1).

Participation is also very safe as compromised nodes do not leak any private keys, allowing more users to participate either on their own computers or by trusting their friends to host their participation bundle* for them.

Check out POP’ initiative that incentivises Validators with nice (and live) NFTs.

To start participation one just:

  • makes sure to fund an account with some native Algo token.
  • installs Algorand participation node — eg using Algorun.
  • generates participation bundle
  • signs and sends participation bundle registration transaction
    — eg using Allo’CTRL

*Participation bundle (participation keys) is a collection of ephemeral, quantum-resistant Falcon keys valid for a certain block (date) range — up to 5 months. It is the bundle, not Validator’s private keys, that is used to authenticate votes and block proposals.

Algorand participation state machine

Committing a valid participation bundle registration transaction changes the state of the Validator’s account to online. All participating nodes expect votes and proposals to be submitted by Validator’s node during the validity period of the bundle. The online state of the account switches to offline when one of the following events takes place:

  • the account is closed out (amount of Algo falls to 0)
  • Validator commits unregister transaction
  • participation bundle validity expires
    and any transaction that modifies Validator’s account state takes place

Above shows that online status might be misleading as accounts are still marked as online after they stop voting due to bundle expiration. This is intended behaviour as the ledger cannot be updated without someone paying a transaction fee.

Algonode.io developed an off-chain solution that scans for expired participations and sends 1 µAlgo transaction with a special note to both help label the account offline and notify the Validator of the expiry event.

To properly calculate the current number of participating accounts one has to use the online flag and check if the bundle is not expired.

SELECT
COUNT() onlineAccounts
intDiv(sum(microalgos),1000000) onlineVotingAlgo
,intDiv(max(microalgos),1000000) largestAStake
,intDiv(avg(microalgos),1000000) averageAStake
,intDiv(quantileExact(0.15)(microalgos),1000000) q15
,intDiv(quantileExact(0.25)(microalgos),1000000) q25
,intDiv(quantileExact(0.50)(microalgos),1000000) q50
,intDiv(quantileExact(0.75)(microalgos),1000000) q75
,intDiv(quantileExact(0.95)(microalgos),1000000) q95
FROM
mainnet.account final
WHERE
is_online
and vote_last > (select MAX(round) FROM mainnet.txn)

Above SQL query can be used to calculate current number of non expired online accounts as well as total, largest and average voting stakes in Algos along with quantiles.

Getting accurate historical participation data

Algorand ledger datasets available publicly or commercially contain only current account state and transaction history, no precomputed or sampled participation data.

Let’s try to calculate historical data using only transaction information.

Calculations are ignoring Genesis accounts with initial online state.
Genesis keys had validity up to block 3 million so after that block our calculations are 100% correct.

Before we calculate the number of concurrently registered keys we have to find out all trueÂą start and stop events block numbers (rounds).

Let’s take the round (R1) of a key registration transaction that has

  • the first valid round for the participation bundle less than R1
  • the last valid round for the bundle greater than R1.

This will be our start event.

A stop event round number (R2) will be the lowest of the:

  • end of the validity period for the participation bundle
  • next registration TX
  • account close TX
  • next key unregistration / offline TX

To calculate this event using ClickHouse SQL dialect we use a special LEFT ASOF JOIN that searches for the first matching row on the right side.
Notice that we are replacing 0 (missing) values with INTMAX as ClickHouse hates NULLs by default. We also use extended Algorand transaction annotations available for us in this analytics solution.

CREATE VIEW allregevents AS
SELECT
realtime
,toUInt64(t.snd_addr_id) snd_addr_id -- account ID
,toUInt64(t.round + 1) r1 -- reg. start round
,toUInt64( -- calculated reg. end round
min2(
JSONExtractUInt(t.txn_extra ,'votelst'),
if(t2.round = 0,9223372036854775800, t2.round)
)) r2
FROM
mainnet.txn t left asof join mainnet.txn t2
on (
t.snd_addr_id = t2.snd_addr_id
and t2.type_ext in ('key_reg','key_unreg','acc_close')
and t2.round > t.round
)
WHERE
t.type_ext = 'key_reg'
and JSONExtractUInt(t.txn_extra ,'votelst') > t.round

With this view, we can now split start and end events into separate rows and calculate cumulative sum to get concurrent active participations.
We also filter out future (key expiry) events.

WITH events as (
SELECT
r1 rnd, 1 v -- Add 1 for start of validity
FROM
allregevents
UNION ALL
SELECT
r2 rnd, -1 v -- Substract 1 for end of validity
FROM
allregevents
) SELECT
rnd
,sum(v) OVER (ORDER BY rnd ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) vs
FROM
events
WHERE
rnd < (select max(round) from mainnet.txn t )
ORDER BY rnd;

It takes ~10 seconds to calculate all concurrent active online accounts over the 1B transaction mainnet history. One can verify that the last entry matches the value of online accounts as calculated from the current account state.

This query returns full granularity with round numbers instead of dates.
To create a nice graph with the number of active online accounts at the end of each day we can use “sampling” over the allregevents view.

Unfortunately, ClickHouse does not support correlated subqueries so we have to go full cartesian in this case. Fortunately, this query takes only 15s to complete.

WITH rng AS ( -- Get last round for each day
SELECT
DATE(realtime) realtime
,max(round) round
FROM mainnet.block_stats
GROUP BY 1
) SELECT -- Get number of active registrations for each round/day
realtime day
, count() online
, round
FROM rng CROSS JOIN allregevents
WHERE r1 < rng.round AND r2 > rng.round
GROUP BY realtime, round
ORDER BY 1;

Online account with nonexpired participation keys

Watch out for upcoming articles on calculating historical balance and historical stake.


[1] There is an additional 320 block delay (~15 minutes) between register /unregister events and stake change being considered by nodes. This is to prevent any quick gains due to stake manipulation.

Resources:

Queries and charts in this article have been executed on a commercial Allo.info analytics platform that is based on a distributed ClickHouse Algorand dataset and Apache SuperSet self-service BI frontend.

See also: