Access Helium Off-Chain PoC on AWS

With the Solana migration, the Helium ETL has became obsolete, the Proof of Coverage (PoC) data are not anymore on the chain but they can be publicly accessed from an AWS bucket. We are going to see how to access these data and what I’m developing to manage them.

Discover my github project to manage Helium Off-Chain POC data

AWS bucket with Helium Off-Chain PoC detailed informations

The AWS bucket from Helium Foundation is a “requestor pays” type of bucket. This means, Helium Foundation is not paying for your data request but you will pay for it. As we can understand this, I would really loved to see the use of some crypto solution like Storj or StreamR to provide theses data … This is a point I’ll look at later. By the way, to access these data, you need an AWS account.

Go to your AWS account (create one if you don’t have one)
Go to IAM, then select user menu
Add a User, give it a name like “helium_reader”, then click on next
Create a group with a name like “Helium_PoC_group”, then select authorized actions
Role : AmazonS3ReadOnlyAccess
Affect the created group to the user and terminate the creation
Now you can click on the user name on the user list and go to “identification information and security”, then find “access key” and create a new one
Select “Application executed outside from AWS”, give it a name and validate
Now you have a key pair, with an access key and secret key.
Securely store the secret key: it can’t be retrieve (but you can create a new one)

AWS S3 access cost

To give you an idea of the related cost to access the data, you have to pay for any LIST command (to get the file names), then the GET command to extract the data, then for the data traffic. The price is varying so this is just indicating, if you move the data out of AWS you pays about:

Considering you have per days, about:

Type of Data	Files	Total Size
Beacon	317	44MB
Witnesses	1082	12GB
Reward	1	30MB
Validated IoT PoC	1500-2000	48GB

Idea of the volume of Helium assets to be retrieved for off-chain data per day

So we can estimate a S3 access cost:

Command	Per 1000 Commands or GB	Per Day	Per Month
LIST	$0,005	$0,0001	$0,003
GET	$0,0004	$0,0008	$0,025
DATA TRANSFER	$0,01	$0,48	$15

Idea of the AWS S3 cost to access Helium data (on March 2023)

Helium Foundation Buckets

The Helium Foundation S3 buckets containing the data is: foundation-poc-data-requester-pays located in US_WEST2 AWS region. I contains different type of files as described in the helium oracle documentation page.When listing / processing these files, you may know the following things:

Files comes by family, you get all the beacon files, then all the witness files, so if you want to get the updates you need to keep a pointer on the last file of a category and not jumping on the next one or you will not get the new files.
There are between 300 and 2000 files a day per category
Size of Witness files is about 20 times bigger than Beacons
Some of the files are corrupted, some are zero len… you need to be prepared to master any Exception
Link between beacon and witness is basically made base on timeframe and data content. In my experience the most part of witness does not match any beacon.

It’s really better to consider the Validated IoT Poc file than the raw files, this one contains Beacon and Witness reassembled together and it really simplify the data processing. As the file comes from different Oracle, you have plenty of files within the same timeframe.

The reward are published only once a day at 1 AM UTC and you only get a sum of reward for the previous day for witnesses / beacon / data transfer in $IoT

Processing challenge

If you target to process an hour of data about 15 minutes to be able to resync your data from ETL, you need to process about:

100.000 selected witness / minutes
10.000 beacon / minutes

In terms for volume, it’s about 200GB / week of data to store.

New ETL to extract and load the Off line poc data

I did an open source project to load these data into a scalable database mongo-db, you can find my Helium Off-chain ETL on github.

Installation is quite easy: everything is packaged in a docker compose file with a Makefile to build and run it.

As it has a lot of caching mechanism and parallelism for performance improvement, you need a server with 3 SSD for each of the shards, a system NVMe is recommended for the rest. The memory size is a minimum of 64GB and 24 CPU is a good recommendation.

If you have some bench of your setup, let me know, I’ll be happy to share some results here:

Setup	CPU	MEM	DB Storage	SYS Storage	Max PoC / m
Disk91	E5-2690v1 x8/16 @3Ghz 2012	64GB DDR3	3xSSD 4TB	NVMe 2TB	5000

ETL POC Setup performance

This kind of server based on Amazon public price is about $1400 / month. If you have a need for accessing these data or run your proper ETL … just contact me, I can do something for you for a third of this.

The solution comes with grafana dashboard to monitor processing in real time, this is really useful to follow the processing.

Functional Behavior

The ETL is loading all the witnesses, beacon and reward data in different corresponding collections. In parallel it updates a hotspot entity with all the information consolidated. These information can be retrieved from a rest API.

disk91.com – the IoT blog

IoT, Sigfox, LoRaWan, technology, hardware, security, hacking (DiY)

AWS bucket with Helium Off-Chain PoC detailed informations

AWS S3 access cost

Helium Foundation Buckets

Processing challenge

New ETL to extract and load the Off line poc data

Functional Behavior

Like this:

Related

Leave a Reply Cancel reply

AWS bucket with Helium Off-Chain PoC detailed informations

AWS S3 access cost

Helium Foundation Buckets

Processing challenge

New ETL to extract and load the Off line poc data

Functional Behavior

Share :

Like this:

Related

Leave a Reply Cancel reply