Start a search engine (Ubuntu)
This documentation details how you can start, and host, your own smart contract search engine using Ubuntu 18.04LTS.
Apache 2
Update system and install Apache2
List the firewall rules and add Apache2 on port 80 only
Check that Apache is running
Enable modules for the proxy
Get the domain name (as discussed at the start of this page) and use it in the following steps i.e. search-engine.com or 13.236.179.58 (just the IP without the protocol). In this example, we are just using fictitious search-engine.com
Set up Virtual Host
Create the following configuration file
Add the following content to the file which you just created (note: we will explain the Proxy component a little later in this document). Obviously you will need to replace search-engine.com with your public IP/Domain
Enable the new site
Disable the original Apache2 site
Test the configuration which we just created
Reload the system to show the new site
Here is a quick reference of commands which you will find usefull in the future
Search engine source code
Place the code in the appropriate directories
Set final permissions on all files
Javascript
This system uses a single Javascript file which passes events and data back and forth between the HTML and Python. The code repository currently has one Javascript file secondStateJS.js
which services the FairPlay - Product Giveaway site and one Javascript file ethJS.js
which services the Ethereum Search Engine Demonstration. One of the strong points of this search engine is that it allows you to create your own custom HTML/JS so that you can render your data in any way.
publicIp
publicIp If running this in global mode, please make sure that the var publicIp = "";
in the secondStateJS.js file is set to the public domain name of the server which is hosting the search engine (including the protocol) i.e.
searchEngineNetwork
searchEngineNetwork in secondStateJS.js Please ensure that the correct network id is set in the "searchEngineNetwork" variable in the secondStateJS.js file i.e.
esIndexName
esIndexName name in secondStateJS.js Please ensure that the appropriate index name will be set (depending on which network you selected in the previous step) The logic is as follows.
Just please make sure that you set the esIndexName to the same value as the config.ini (i.e. note how the below config.ini common index and the above secondStateJS.js esIndexName are both set to testnet).
This Javascript configuration will be made part of the global configuration as per the GitHub Issue
Configuring the Python harvester (a single config.ini file)
It is important that the search engine is pointing to the correct RPC endpoint i.e. CMT TestNet vs MainNet. It is also important that you set the average block time (this will make the system run more efficiently).
Elasticsearch Please also put in your Elasticsearch URL and region.
Index names The masterindex, abiindex and bytecode index can all stay as they are below. You might just want to change the commonindex to be more descriptive i.e. mainnet, testnet etc.
SSL (HTTPS) using "lets encrypt"
Potential issues with SSL
There is a known issue which results in the following error message...
If you experience this, please use this alternative solution for certbot-auto
Then follow the prompts.
Harvesting
Please follow the instructions below so that your system can automatically execute all of the search engine's scripts.
Operating system libraries
Python3
Pip3
Eth-Abi
Web3
Boto3
AWS Requests Auth
Note: this particular implementation of the smart contract search engine uses AWS Elasticsearch and as such there is a small amount of Amazon and Elasticsearch specific configuration. You can choose to use your own local installation of Elasticsearch if you prefer that.
AWS Command Line Interface (CLI)
Configuring AWS CLI See official AWS documentation as required
Elasticsearch
Elasticsearch
AWS provides Elasticsearch as a service. To set up an AWS Elasticsearch instance visit your AWS console using the following URL.
Amazon Web Services (AWS)
Authentication and access control
Please read the Amazon Elasticsearch Service Access Control documentation. This very flexible authentication and access control can be set up after the fact by writing a policy.
Recommended usage - Run once at startup!
Run at startup
Technically speaking you will just want to run all of these commands the one time, at startup!. The system will take care of itself. Here is an example of how to run this once at startup.
Create a bash file, say, ~/startup.sh
and make it executable with the chmod a+x
command. Then put the following code in the file. Please be sure to replace https://testnet-rpc.cybermiles.io:8545
with that of your RPC.
Add the following command to cron using crontab -e
command.
The smart contract search engine will autonomously harvest upon bootup.
Detailed explanation of harvesting modes
Full
The harvest.py -m full
mode operates in the following way. It divides the number of blocks in the blockchain by the max_threads
setting in the config.ini file, to create chunks of blocks. It then starts a separate thread for each of those chunks. Each chunk is harvested in parallel. For example, if the blockchain has 1 million blocks and the max_threads
value is 500, there will be 500 individual threads processing 2, 000 blocks each.
The harvest.py -m full
mode quickly and efficiently traverses the entire blockchain with its sole purpose being to find transactions which involve smart contracts. Transactions which involve smart contract creation i.e. have a contract address in the transaction receipt are stored in the smart contract search engine's masterindex.
Topup
The harvest.py -m topup
mode operates in the following way. It uses the following formular to determine how many of the most recent blocks to harvest.
For example if the blockchain has 1 million blocks and the seconds_per_block
in the config.ini is set to 10, the system will process the most recent block 1000000
and stop at block 999990
(harvest only the 10 most recent blocks). Once executed, this topup will run repeatedly i.e. it does not have to be run using cron because it already uses Python time.sleep
and will repeat as required.
The harvest.py -m topup
mode quickly and efficiently traverses only a few of the latest blocks with its sole purpose being to find only the most recent transactions which involve smart contracts. These are stored in the smart contract search engine's masterindex.
Both of the above modes have only identified (and saved to the masterindex) transactions which involve the creation of smart contracts. None of this data is searchable via the API. These full and topup modes are run at all times as they provide the definitive list of transactions which the search engine has to process on an ongoing basis.
Transaction (tx)
The harvest.py -m tx
mode operates in the following way. It takes all of the known ABIs which are stored in the abiindex and all of the known smart contract related transactions which are stored in the smart contract search engine's masterindex. It then creates a web3 smart contract instantiation for every combination and tests to see if web3 can sucessfully call all of the contract's public view functions.
If the contract address is not already in the commonindex, and the contract instance returns valid data for all of the public/view functions defined in the ABI, then a new entry is created in the commonindex.
If the contract address is already indexed in the commonindex, there is still a chance that this particular ABI and smart contract combination is new. Therefore the code will go ahead and try to instantiate a web3 contract instance with the ABI and address at hand. If the public view functions of the contract instance are all returned perfectly then the code will assume that this is an associated ABI i.e. a valid ABI which is part of Solidity inheritance etc. The outcome of this process will include the abiShaList being updated as well as the functionDataList seeing the addition of the new data.
If the contract instance is unable to return valid data for each of the public view functions of the ABI in question, then it is assumed that the contract address and the ABI were never related. This combination goes into the ignoreindex because this will never change.
Faster State
The harvest.py -m faster_state
mode operates in the following way. It traverses only the most recent blocks, calls the public/view functions of the contracts in those blocks and updates the commonindex. Remembering that the commonindex is the index which provides the smart contract state data to the API.
Note This -m faster_state
mode requires that the smart contract instantiation (both the transaction hash and all associated ABIs) is already known to the smart contract search engine indices. This requires work. This mode was created for a special case whereby the search engine was required to provide real-time data for a blockchain with 1 second block intervals. Part of this special use case required that the software which was responsible for instantiating new contracts explicitly indexed the contract's ABIs and also the transaction hash of the contract instantiation. This was achieved via the submitManyAbis API call.
This mode provides the fastest data updates available. However, as mentioned above, it also needs to have each contract's ABIs and transaction hash to be purposely indexed asap. This mode is not about self discovery, but rather about explicit indexing in real-time. The system can perform self discovery but the self discovery process (testing combinations of many ABIs and many contract addresses i.e. millions of combinations) takes longer than 1 second.
ABI
The harvest.py -m abi
mode operates in the following way. It fetches the already indexed records from the commonindex and also fetches all of the already indexed ABIs from the abiindex. It then creates web3 contract instantiations for each of the combinations. Then in relation to each combination, if the contract instance is unable to return the public view function data perfectly then the ABI and address combiination is added to the ignoreindex. This prevents the abi mode from ever checking that particular ombination out again. On the other hand, if the public view functions of the contract instance are all returned perfectly, the code will assume that this is an associated ABI i.e. a valid ABI which is part of Solidity inheritance etc. The outcome of this process will include the abiShaList being updated as well as the functionDataList seeing the addition of the new data.
The primary purpose of the ABI mode is to introduce newly uploaded ABIs to pre-existing contract addresses of which they may be associated with.
State
The harvest.py -m state
mode operates in the following way. It fetches all indexed contracts from the commonindex. It reads their abiShaList and creates a web3 contract instance for each of the ABI / address combinations. Now the ABI address combinations are not questionable because they have already been through a process of making sure that they are a real relationship which can yield real data. The state mode fetches the public view data from the contract and updates the index if the data is different to what was originally stored. It also creates a local hash of the data so that when it repeats this process over and over it can compare hashes on local disk rather than remotely issuing queries to the index.
Bytecode
The harvest.py -m bytecode
mode operates in the following way. It fetches all indexed contracts and matches their bytecode (which is in theie transaction instance input) with any individual bytecode entries in the bytecodeindex
Indexed
The harvest.py -m indexed
mode operates in the following way. It loops through all of the indexed contracts from the commonindex and sets the indexed
value of any item in the masterindex to true
if the contract addresses from the two indices match. This is an independent script which does not have to be run, however it does help speed up performance slightly by excluding any already indexed items when the tx mode is executed.
Flask
Python Flask / Apache2 Integration
Open crontab for editing
Add the following line inside crontab
CORS (Allowing Javascript, from anywhere, to access the API)
To enable CORS please following these instructions.
Ensure that Apache2 has the mod_rewrite enabled
Ensure that Apache2 has the headers library enabled by typing the following command.
Open the /etc/apache2/apache2.conf
file and add the following.
Then in addition to this, please open the /etc/apache2/sites-enabled/search-engine-le-ssl.conf
file (which was created automatically by the above "lets encrypt" command) and add the following code inside the VirtualHost
section.
Also once all of this is done, please just give the server a quick reboot; during this time all of the processes will fire off as per the cron etc.
Talking to the smart contract search engine from your DApp
The es-ss.js data services library provides a simple way for your DApp to talk to the data (using native Javascript(client side) and/or Node(server side)) which is indexed in this system.
Last updated