Start a search engine (Ubuntu)
This documentation details how you can start, and host, your own smart contract search engine using Ubuntu 18.04LTS.

Apache 2

Update system and install Apache2
1
sudo apt-get update
2
sudo apt-get -y upgrade
3
sudo apt-get -y install apache2
Copied!
List the firewall rules and add Apache2 on port 80 only
1
sudo ufw app list
2
sudo ufw allow 'Apache'
Copied!
Check that Apache is running
1
sudo systemctl status apache2
Copied!
Enable modules for the proxy
1
sudo a2enmod proxy
2
sudo a2enmod proxy_http
3
sudo systemctl restart apache2
Copied!
Get the domain name (as discussed at the start of this page) and use it in the following steps i.e. search-engine.com or 13.236.179.58 (just the IP without the protocol). In this example, we are just using fictitious search-engine.com
Set up Virtual Host
1
sudo mkdir -p /var/www/search-engine.com/html
2
sudo chown -R $USER:$USER /var/www/search-engine.com/html
3
sudo chmod -R 755 /var/www/search-engine.com
Copied!
Create the following configuration file
1
sudo vi /etc/apache2/sites-available/search-engine.com.conf
Copied!
Add the following content to the file which you just created (note: we will explain the Proxy component a little later in this document). Obviously you will need to replace search-engine.com with your public IP/Domain
1
<VirtualHost *:80>
2
ProxyPreserveHost On
3
ProxyPass /api http://127.0.0.1:8080/api
4
ProxyPassReverse /api http://127.0.0.1:8080/api
5
ServerAdmin [email protected]
6
ServerName search-engine.com
7
ServerAlias www.search-engine.com
8
DocumentRoot /var/www/search-engine.com/html
9
ErrorLog ${APACHE_LOG_DIR}/error.log
10
CustomLog ${APACHE_LOG_DIR}/access.log combined
11
</VirtualHost>
Copied!
Enable the new site
1
sudo a2ensite search-engine.com.conf
Copied!
Disable the original Apache2 site
1
sudo a2dissite 000-default.conf
Copied!
Test the configuration which we just created
1
sudo apache2ctl configtest
Copied!
Reload the system to show the new site
1
sudo systemctl reload apache2
Copied!
Here is a quick reference of commands which you will find usefull in the future
1
## Stop and start
2
sudo systemctl stop apache2
3
sudo systemctl start apache2
4
## Restart
5
sudo systemctl restart apache2
6
## Reload without interuption
7
sudo systemctl reload apache2
Copied!

Search engine source code

1
cd ~
2
git clone https://github.com/second-state/smart-contract-search-engine.git
Copied!
Place the code in the appropriate directories
1
cp -rp ~/smart-contract-search-engine/* /var/www/search-engine.com/html/
Copied!
Set final permissions on all files
1
sudo chown -R $USER:$USER /var/www/search-engine.com/*
Copied!

Javascript

This system uses a single Javascript file which passes events and data back and forth between the HTML and Python. The code repository currently has one Javascript file secondStateJS.js which services the FairPlay - Product Giveaway site and one Javascript file ethJS.js which services the Ethereum Search Engine Demonstration. One of the strong points of this search engine is that it allows you to create your own custom HTML/JS so that you can render your data in any way.
publicIp
publicIp If running this in global mode, please make sure that the var publicIp = ""; in the secondStateJS.js file is set to the public domain name of the server which is hosting the search engine (including the protocol) i.e.
1
var publicIp = "https://www.search-engine.com"; //No trailing slash please
Copied!
searchEngineNetwork
searchEngineNetwork in secondStateJS.js Please ensure that the correct network id is set in the "searchEngineNetwork" variable in the secondStateJS.js file i.e.
1
var searchEngineNetwork = "18"; // CyberMiles MainNet
Copied!
esIndexName
esIndexName name in secondStateJS.js Please ensure that the appropriate index name will be set (depending on which network you selected in the previous step) The logic is as follows.
1
if (searchEngineNetwork == "19") {
2
blockExplorer = "https://testnet.cmttracking.io/";
3
esIndexName = "testnet";
4
}
5
6
if (searchEngineNetwork == "18") {
7
blockExplorer = "https://www.cmttracking.io/";
8
esIndexName = "cmtmainnetmultiabi";
9
}
Copied!
Just please make sure that you set the esIndexName to the same value as the config.ini (i.e. note how the below config.ini common index and the above secondStateJS.js esIndexName are both set to testnet).
This Javascript configuration will be made part of the global configuration as per the GitHub Issue
1
[commonindex]
2
network = testnet
Copied!
1
if (searchEngineNetwork == "19") {
2
blockExplorer = "https://testnet.cmttracking.io/";
3
esIndexName = "testnet";
4
}
5
6
if (searchEngineNetwork == "18") {
7
blockExplorer = "https://www.cmttracking.io/";
8
esIndexName = "cmtmainnetmultiabi";
9
}
Copied!

Configuring the Python harvester (a single config.ini file)

It is important that the search engine is pointing to the correct RPC endpoint i.e. CMT TestNet vs MainNet. It is also important that you set the average block time (this will make the system run more efficiently).
1
[blockchain]
2
rpc = https://testnet-rpc.cybermiles.io:8545
3
seconds_per_block = 1
Copied!
Elasticsearch Please also put in your Elasticsearch URL and region.
1
[elasticSearch]
2
endpoint = search-smart-contract-search-engine-abcdefg.es.amazonaws.com
3
aws_region = ap-southeast-2
Copied!
Index names The masterindex, abiindex and bytecode index can all stay as they are below. You might just want to change the commonindex to be more descriptive i.e. mainnet, testnet etc.
1
# Stores every transaction in the blockchain which has a contractAddress (an instantiation of a contract as apposed to a transaction which just moved funds from one EOA to another)
2
[masterindex]
3
all = all
4
5
# Just stores ABI data (an index which holds every ABI that could match up with a contract address and make up a contract instantiation)
6
[abiindex]
7
abi = abi
8
9
# Just stores a contracts bytecode with the appropriate key to find said bytecode
10
[bytecodeindex]
11
bytecode = bytecode
12
13
# Stores all of the smart contract instance details, ABI hashes, function data etc.
14
[commonindex]
15
network = network
16
17
# Ignore stores ABI and contract address Sha3 values which are not contract instantiations. This improves performance greatly because an ABI hash mixed with a contract address hash will either be a contract instance or not and this will never change once set.
18
[ignoreindex]
19
ignore = ignore
20
21
# The default number of threads is set to 500. However, this value can be raised if you also raise the ulimit when starting the harvest.py scripts i.e. adding ulimit -n 1000 will facilitate a setting here of max_threads = 1000
22
[system]
23
max_threads = 500
24
25
# Please provide a raw GitHub URL which contains your first ABI. This is required for the search engine to initialize
26
[abi_code]
27
initial_abi_url = https://raw.githubusercontent.com/tpmccallum/test_endpoint2/master/erc20_transfer_function_only_abi.txt
Copied!

SSL (HTTPS) using "lets encrypt"

1
sudo wget https://dl.eff.org/certbot-auto -O /usr/sbin/certbot-auto
Copied!
1
sudo chmod a+x /usr/sbin/certbot-auto
Copied!
1
sudo certbot-auto --apache -d search-engine.com -d www.search-engine.com
Copied!

Potential issues with SSL

There is a known issue which results in the following error message...
1
subprocess.CalledProcessError: Command '['virtualenv', '--no-site-packages', '--python', '/usr/bin/python2.7', '/opt/eff.org/certbot/venv']' returned non-zero exit status 1
Copied!
If you experience this, please use this alternative solution for certbot-auto
1
sudo apt-get update
2
sudo apt-get install software-properties-common
3
sudo add-apt-repository universe
4
sudo add-apt-repository ppa:certbot/certbot
5
sudo apt-get update
Copied!
1
sudo apt-get install certbot python-certbot-apache
Copied!
1
sudo certbot --apache
Copied!
Then follow the prompts.

Harvesting

Please follow the instructions below so that your system can automatically execute all of the search engine's scripts.

Operating system libraries

Python3
1
sudo apt-get -y update
2
sudo apt-get -y upgrade
3
4
# If using Ubuntu 18.04LTS Python 3.6 will already be installed
5
6
# If using older Ubuntu, you will need to install Python3.6 and Python3.6-dev
7
#sudo add-apt-repository ppa:jonathonf/python-3.6
8
#sudo apt-get -y update
9
#sudo apt-get install python3.6-dev
Copied!
Pip3
1
sudo apt-get -y install python3-pip
Copied!
Eth-Abi
1
python3.6 -m pip install eth-abi --user
Copied!
Web3
1
python3.6 -m pip install web3 --user
Copied!
Boto3
1
python3.6 -m pip install boto3 --user
Copied!
AWS Requests Auth
Note: this particular implementation of the smart contract search engine uses AWS Elasticsearch and as such there is a small amount of Amazon and Elasticsearch specific configuration. You can choose to use your own local installation of Elasticsearch if you prefer that.
1
python3.6 -m pip install aws_requests_auth --user
Copied!
AWS Command Line Interface (CLI)
1
sudo apt-get install awscli
Copied!
Configuring AWS CLI See official AWS documentation as required
1
aws configure
Copied!
Elasticsearch
1
python3.6 -m pip install elasticsearch --user
Copied!

Elasticsearch

AWS provides Elasticsearch as a service. To set up an AWS Elasticsearch instance visit your AWS console using the following URL.
1
https://console.aws.amazon.com/console/home
Copied!

Amazon Web Services (AWS)

Authentication and access control
Please read the Amazon Elasticsearch Service Access Control documentation. This very flexible authentication and access control can be set up after the fact by writing a policy.
Run at startup
Technically speaking you will just want to run all of these commands the one time, at startup!. The system will take care of itself. Here is an example of how to run this once at startup.
Create a bash file, say, ~/startup.sh and make it executable with the chmod a+x command. Then put the following code in the file. Please be sure to replace https://testnet-rpc.cybermiles.io:8545 with that of your RPC.
1
#!/bin/bash
2
while true
3
do
4
STATUS=$(curl --max-time 30 -s -o /dev/null -w '%{http_code}' https://YOUR RPC NODE GOES HERE)
5
if [ $STATUS -eq 200 ]; then
6
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m init >/dev/null 2>&1 &
7
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m abi >/dev/null 2>&1 &
8
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m full >/dev/null 2>&1 &
9
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m topup >/dev/null 2>&1 &
10
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m tx >/dev/null 2>&1 &
11
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m state >/dev/null 2>&1 &
12
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m bytecode >/dev/null 2>&1 &
13
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m indexed >/dev/null 2>&1 &
14
cd /var/www/search-engine.com/html/python && ulimit -n 10000 && nohup /usr/bin/python3.6 harvest.py -m faster_state >/dev/null 2>&1 &
15
break
16
else
17
echo "Got $STATUS please wait"
18
fi
19
sleep 10
20
done
Copied!
Add the following command to cron using crontab -e command.
1
@reboot ~/startup.sh
Copied!
The smart contract search engine will autonomously harvest upon bootup.

Detailed explanation of harvesting modes

Full

The harvest.py -m full mode operates in the following way. It divides the number of blocks in the blockchain by the max_threads setting in the config.ini file, to create chunks of blocks. It then starts a separate thread for each of those chunks. Each chunk is harvested in parallel. For example, if the blockchain has 1 million blocks and the max_threads value is 500, there will be 500 individual threads processing 2, 000 blocks each.
The harvest.py -m full mode quickly and efficiently traverses the entire blockchain with its sole purpose being to find transactions which involve smart contracts. Transactions which involve smart contract creation i.e. have a contract address in the transaction receipt are stored in the smart contract search engine's masterindex.

Topup

The harvest.py -m topup mode operates in the following way. It uses the following formular to determine how many of the most recent blocks to harvest.
1
stopAtBlock = latestBlockNumber - math.floor(100 / int(self.secondsPerBlock))
Copied!
For example if the blockchain has 1 million blocks and the seconds_per_block in the config.ini is set to 10, the system will process the most recent block 1000000 and stop at block 999990 (harvest only the 10 most recent blocks). Once executed, this topup will run repeatedly i.e. it does not have to be run using cron because it already uses Python time.sleep and will repeat as required.
The harvest.py -m topup mode quickly and efficiently traverses only a few of the latest blocks with its sole purpose being to find only the most recent transactions which involve smart contracts. These are stored in the smart contract search engine's masterindex.
Both of the above modes have only identified (and saved to the masterindex) transactions which involve the creation of smart contracts. None of this data is searchable via the API. These full and topup modes are run at all times as they provide the definitive list of transactions which the search engine has to process on an ongoing basis.

Transaction (tx)

The harvest.py -m tx mode operates in the following way. It takes all of the known ABIs which are stored in the abiindex and all of the known smart contract related transactions which are stored in the smart contract search engine's masterindex. It then creates a web3 smart contract instantiation for every combination and tests to see if web3 can sucessfully call all of the contract's public view functions.
If the contract address is not already in the commonindex, and the contract instance returns valid data for all of the public/view functions defined in the ABI, then a new entry is created in the commonindex.
If the contract address is already indexed in the commonindex, there is still a chance that this particular ABI and smart contract combination is new. Therefore the code will go ahead and try to instantiate a web3 contract instance with the ABI and address at hand. If the public view functions of the contract instance are all returned perfectly then the code will assume that this is an associated ABI i.e. a valid ABI which is part of Solidity inheritance etc. The outcome of this process will include the abiShaList being updated as well as the functionDataList seeing the addition of the new data.
If the contract instance is unable to return valid data for each of the public view functions of the ABI in question, then it is assumed that the contract address and the ABI were never related. This combination goes into the ignoreindex because this will never change.

Faster State

The harvest.py -m faster_state mode operates in the following way. It traverses only the most recent blocks, calls the public/view functions of the contracts in those blocks and updates the commonindex. Remembering that the commonindex is the index which provides the smart contract state data to the API.
Note This -m faster_state mode requires that the smart contract instantiation (both the transaction hash and all associated ABIs) is already known to the smart contract search engine indices. This requires work. This mode was created for a special case whereby the search engine was required to provide real-time data for a blockchain with 1 second block intervals. Part of this special use case required that the software which was responsible for instantiating new contracts explicitly indexed the contract's ABIs and also the transaction hash of the contract instantiation. This was achieved via the submitManyAbis API call.
This mode provides the fastest data updates available. However, as mentioned above, it also needs to have each contract's ABIs and transaction hash to be purposely indexed asap. This mode is not about self discovery, but rather about explicit indexing in real-time. The system can perform self discovery but the self discovery process (testing combinations of many ABIs and many contract addresses i.e. millions of combinations) takes longer than 1 second.

ABI

The harvest.py -m abi mode operates in the following way. It fetches the already indexed records from the commonindex and also fetches all of the already indexed ABIs from the abiindex. It then creates web3 contract instantiations for each of the combinations. Then in relation to each combination, if the contract instance is unable to return the public view function data perfectly then the ABI and address combiination is added to the ignoreindex. This prevents the abi mode from ever checking that particular ombination out again. On the other hand, if the public view functions of the contract instance are all returned perfectly, the code will assume that this is an associated ABI i.e. a valid ABI which is part of Solidity inheritance etc. The outcome of this process will include the abiShaList being updated as well as the functionDataList seeing the addition of the new data.
The primary purpose of the ABI mode is to introduce newly uploaded ABIs to pre-existing contract addresses of which they may be associated with.

State

The harvest.py -m state mode operates in the following way. It fetches all indexed contracts from the commonindex. It reads their abiShaList and creates a web3 contract instance for each of the ABI / address combinations. Now the ABI address combinations are not questionable because they have already been through a process of making sure that they are a real relationship which can yield real data. The state mode fetches the public view data from the contract and updates the index if the data is different to what was originally stored. It also creates a local hash of the data so that when it repeats this process over and over it can compare hashes on local disk rather than remotely issuing queries to the index.

Bytecode

The harvest.py -m bytecode mode operates in the following way. It fetches all indexed contracts and matches their bytecode (which is in theie transaction instance input) with any individual bytecode entries in the bytecodeindex

Indexed

The harvest.py -m indexed mode operates in the following way. It loops through all of the indexed contracts from the commonindex and sets the indexed value of any item in the masterindex to true if the contract addresses from the two indices match. This is an independent script which does not have to be run, however it does help speed up performance slightly by excluding any already indexed items when the tx mode is executed.

Flask

1
python3.6 -m pip install Flask --user
Copied!

Python Flask / Apache2 Integration

1
sudo ufw allow ssh
2
sudo ufw enable
3
sudo ufw allow 8080/tcp
4
sudo ufw allow 443/tcp
Copied!
Open crontab for editing
1
crontab -e
Copied!
Add the following line inside crontab
1
@reboot sudo ufw enable
2
@reboot cd /var/www/search-engine.com/html/python && nohup /usr/bin/python3.6 io.py >/dev/null 2>&1 &
Copied!

CORS (Allowing Javascript, from anywhere, to access the API)

To enable CORS please following these instructions.
Ensure that Apache2 has the mod_rewrite enabled
1
sudo a2enmod rewrite
Copied!
Ensure that Apache2 has the headers library enabled by typing the following command.
1
sudo a2enmod headers
Copied!
Open the /etc/apache2/apache2.conf file and add the following.
1
<Directory /var/www/search-engine>
2
Order Allow,Deny
3
Allow from all
4
AllowOverride all
5
Header set Access-Control-Allow-Origin "*"
6
</Directory>
Copied!
Then in addition to this, please open the /etc/apache2/sites-enabled/search-engine-le-ssl.conf file (which was created automatically by the above "lets encrypt" command) and add the following code inside the VirtualHost section.
1
Header always set Access-Control-Allow-Origin "*"
2
Header always set Access-Control-Allow-Methods "POST, GET, OPTIONS"
3
Header always set Access-Control-Max-Age "1000"
4
Header always set Access-Control-Allow-Headers "x-requested-with, Content-Type, origin, authorization, accept, client-security-token"
5
RewriteEngine On
6
RewriteCond %{REQUEST_METHOD} OPTIONS
7
RewriteRule ^(.*)$ $1 [R=200,L]
Copied!
Also once all of this is done, please just give the server a quick reboot; during this time all of the processes will fire off as per the cron etc.
1
sudo shutdown -r now
Copied!

Talking to the smart contract search engine from your DApp

The es-ss.js data services library provides a simple way for your DApp to talk to the data (using native Javascript(client side) and/or Node(server side)) which is indexed in this system.
Last modified 2yr ago