The system works on HCE-DC – multipurpose high productivity, scalable and extensible engine of internet data mining.
It consists of several HCE project’s sub-products and technologies:
- HCE-node – network transport cluster application
- Distributed Crawler (DC) service
- Distributed Tasks Manager (DTM) service
- Web administration management console
- Tools and libraries for crawling and scraping algorithms with REST API and bindings for a Python and PHP development environments.
It provides flexible configuration, automated deployment, and easy integration with 3rd party data mining and analysis projects.