Turn Unstructured Web Data into Actionable Insights
An open-source Python suite of microservices designed to extract, parse, structure, and match product data at scale.
What is ArgusFlow?
ArgusFlow is a collection of high-performance microservices built to live inside your data pipeline. Instead of just crawling pages, Argus focuses on the hardest part: turning raw, messy HTML into clean, structured, and database-ready JSON.
-
Four Specialized ToolsIncludes an intelligent Extractor, an AI-powered HTML Parser, a Title Generalizer, and a Smart Product Linker.
-
Generic & ResilientThe Extractor works out-of-the-box on thousands of sites without needing custom CSS selectors, saving weeks of manual configuration.
-
Plug-and-Play AIAll services accept raw content via API, making them easy to plug into any existing crawler or internal workflow.
Use Cases
ArgusFlow solves specific technical bottlenecks in e-commerce and data science pipelines.
-
Automate Data ExtractionPass HTML from any shop to the Extractor to instantly get prices and availability without writing a single line of XPath.
-
Clean Dirty DataTurn "Solid oak table 200x100 cm natural" into a structured object with material, type, and size fields using the Generalizer.
-
Match Across CatalogsIdentify that Product A from Retailer A is the same as Product B from Retailer B using AI. Essential for price comparison and market intelligence.
The Suite
Product Data Extractor
Extracts prices, brands, and technical specs from any product HTML using an intelligent scoreboard system.
HTML-to-JSON AI
Parses complex HTML snippets into structured, grammar-guaranteed JSON using local AI.
Title-to-Data AI
Converts messy product titles into structured, multi-language JSON objects.
Smart Product Linker
Finds and links identical products across different datasets using deep learning and vector search.
Prerequisites
Ensure you have these tools installed before setting up your local environment.
-
Docker & Docker Compose
Required to run the containerized environment on Linux, Mac, or Windows. Install Docker →
-
Make
Used for orchestration. Standard on macOS/Linux. Windows users should use WSL.
-
cURL
Required for the one-line installation command.
-
Git
Required if you want to clone the repository for development.
Quick Start Guide
Deploy the entire suite locally with these commands.
-
1 Download and Extract
This creates an `argus` directory and pulls the latest source code.
mkdir argus && cd argus && curl -L https://get.argusflow.com | tar -xz --strip-components=1 -
2 Navigate and Initialize
cd argus && make setup -
3 Launch Services
make up
Alternative: Git Clone
git clone https://github.com/getargusflow/argus.git && cd argus && make setup && make up
For deep-dives, view the full Documentation →