Dedicated installation

If you need dedicated installation for extensive use, you can install and configure Tags Reaper packages on your own server.

It takes about 30-60 minutes to deploy your dedicated TR installation to start collecting data from the web.

HCE product is made as self-deployed Linux packages, available for download from the official homepage. Support provided via HCE Forum.

Personalized support and Professional installation services available for a fee.

Components

The Tags Reaper includes several kinds of software that works on different levels and play one of several roles to organize integrated infrastructure:

  • Distributed Crawler (DC)
  • Distributed Task Manager (DTM)
  • and the Hierarchical network transport node (HCE-node).

The DC and DTM are Linux daemons written in the Python language, and the HCE-node application is a Linux binary written in pure C++ with the POCO framework.

Architecture

Both the DC and DTM are end-user service that integrates and encapsulates distributed web-crawling and data processing logic and distributed task management logic correspondently. The HCE-node is an universal network transport infrastructure building application that supports:

  • several advanced message balancing methods;
  • system resource controlling methods;
  • message processing and task results merging;
  • asynchronous and synchronous task execution in the Linux environment;
  • task chains;
  • task results data management;
  • integrated multi-threaded Sphinx search client.

All software is an integrated configured system providing dedicated server installation with optional further support and management assistance.

DC comparison

Parallel distributed data processing

TR distribution based on the HCE-DC and HCE-DTM are open-source frameworks designed to make  distributed data processing easy. Data stream can be uploaded into the distributed system by usage of regular CLI or REST API.

Each portion of data can represent small sub-task that can be processed in parallel. Result of each computation comes to local storage and then can be fetched and used. Once the job is complete, the optional final code can be called after all parts were collected in one place.

The HCE-DC and HCE-DTM as well as the hce-node transport helps to make load-balancing of utilization of the system resources (CPU, RAM and HDD). System runs with maximum performance and stability protecting the host OS from peaks of system resources consumption. That makes a distributed parallel data processing solution reliable and stable.

© 2015-2016 TagsReaper. All rights reserved.