ArgusFlow

Turn Unstructured Web Data into Actionable Insights

An open-source Python suite of microservices designed to extract, parse, structure, and match product data at scale.

What is ArgusFlow?

ArgusFlow is a collection of high-performance microservices built to live inside your data pipeline. Instead of just crawling pages, Argus focuses on the hardest part: turning raw, messy HTML into clean, structured, and database-ready JSON.

  • Four Specialized Tools
    Includes an intelligent Extractor, an AI-powered HTML Parser, a Title Generalizer, and a Smart Product Linker.
  • Generic & Resilient
    The Extractor works out-of-the-box on thousands of sites without needing custom CSS selectors, saving weeks of manual configuration.
  • Plug-and-Play AI
    All services accept raw content via API, making them easy to plug into any existing crawler or internal workflow.

Use Cases

ArgusFlow solves specific technical bottlenecks in e-commerce and data science pipelines.

  • Automate Data Extraction
    Pass HTML from any shop to the Extractor to instantly get prices and availability without writing a single line of XPath.
  • Clean Dirty Data
    Turn "Solid oak table 200x100 cm natural" into a structured object with material, type, and size fields using the Generalizer.
  • Match Across Catalogs
    Identify that Product A from Retailer A is the same as Product B from Retailer B using AI. Essential for price comparison and market intelligence.

The Suite

Prerequisites

Ensure you have these tools installed before setting up your local environment.

  • Docker & Docker Compose

    Required to run the containerized environment on Linux, Mac, or Windows. Install Docker →

  • Make

    Used for orchestration. Standard on macOS/Linux. Windows users should use WSL.

  • cURL

    Required for the one-line installation command.

  • Git

    Required if you want to clone the repository for development.

Quick Start Guide

Deploy the entire suite locally with these commands.

  1. 1 Download and Extract

    This creates an `argus` directory and pulls the latest source code.

    mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1
  2. 2 Navigate and Initialize
    cd argus && make setup
  3. 3 Launch Services
    make up

Alternative: Git Clone

git clone https://github.com/getargusflow/argus.git && cd argus && make setup && make up
For deep-dives, view the full Documentation →