03 Dec

Tags Reaper November Changelog

Demo form

  • Visual Tag Picker tool support the “Multi Item” or “Product” selection mode.
  • Visual Tag Picker tool support the CSS, JavaScript and Highlight On/Off switching.
  • demo form support the “Multi Item” or “Product” scraping mode switching.
  • Visual form of results view in new tab, and the visual tab as main results view in simple mode including items enumeration, images visualization, errors codes and summary, auto switch depending on format.
  • The API request and response view tabs.

Administration Control Panel

  • Data collectors with functionality of scheduled tasks to collect data from user’s projects with set of data selection and scheduling conditions to create data archives and digests with access by user’s sub-domains with HTTP, email notifications, direct access to the digests in the HTML format and more.
  • “Site” entity renamed to the “Project” and extended with additional parameters and configuration options for crawling and scraping.
  • Category for the Project and categories management.
  • Summary statistic for projects.
  • Summary statistics and configuration options settings values for the DC and DTM services.
  • Group operations with projects.

 

The DC service

  • Digest creation engine integration in to the Data Collectors.
  • “Multi Item” or “Product” scraping for regular Template Scraping.
  • Chained article pages support with universal optional merging of the article body parts that has split on several different web pages.
  • Custom depth level support with the possibility to collect pages content from links for the synchronous Real-time API call.

 

The Real-time API

  • Support of check of several limits of main critical resource load parameters for different types of users.
  • Deep validation of the request with protocol definitions and specification.

 

Improvements and fixes

Demo form on the website

  • Templates editor rules and tags management;
  • The VTP tool: pages loading, area selection and restoration after rule add and delete operations;
  • Fixes of the Visual and Demo results view tabs;
  • Configuration settings import format validation;

Administration console

  • Projects search, view, edit and update operations for special cases;
  • Resources search and view fixes;
  • Templates library management fixes;
  • User’s account registration fixes;
  • Resources extended search form fixes of the TagsMask expression and ordering;

DC service

  • Improved images sizes and ordering for News scraping to select best image;
  • Improved date-time field normalization and detection with support of none standard representations;
  • Scraped textual content refining for better view in digests: added rate filters, words count limits, language detection, description optionally limited part from article body;
  • Segmentation of a digests with limits of articles number in a file segment;
  • txt obey additions and fixes for complex cases with the depth of crawling > 1, separated configuration for collected URLs and base root;
  • Fixes of scrapers order and apply sequence;
  • Fixes of users roles and permissions assignment and relation with projects;
  • Design and deployment of Night builds environment for the Debian OS 7 and 8 additions to use separated sets of an acceptance tests.
Share this