Extractor API | Text Extraction Platform

Feature-Rich Extraction


Extract clean text, HTML, images and videos, and other data from URLs. Choose only what you need in each request.

Everything You Need to Extract Clean Text. And Then Some.

Robust API

Extractor API takes care of the technical stuff

IP Rotation & JS Rendering


We automatically apply IP rotation and retries to every request (Free Plan included), and all our paid plans allow you to render JavaScript before extraction.

AI/ML Data Collection
PDF

A simple and effective way to scale your data collection for AI/ML training and knowledge base use cases

Automate data capture


Our API enables data teams to collect previously inaccessible in a cost effective matter for storage in downstream data warehouses/data lakes and ultimately as source data for AI/ML use cases.

News Search

Search the world's news with a single request

Search Country News


Free and paid plans can search the world's news with our News Search endpoint. Every request returns up to 100 news items, including metadata. Collect the URLs - then extract clean text with our Extractor endpoint. Learn more here.

Extract Anything

Extract only the data you need from a page

Clean Text & Metadata


Extract clean text, table data, HTML, image and video links, authors, title, publication date, html and raw text. Choose only the fields you need.

Visual Extractor

Extract text using just our online tool

API Not Required


You can extract data from up to 1,000 URLs at a time using our online visual extractor - not just the API. The visual extractor is included in all plans. Check out how it works here.

Persistent Data

Save your URLs to a job, retrieve them later

Store Your Results


Both the API and the visual extractor allow you to store your results in Jobs. Assign your target URLs a job name, then see their progress online or programmatically. Once the job is done, you can retrieve the results any time. Learn more here.

PDF Data Extraction
PDF

Extract clean data from proprietary local documents and public facing documents

Previously inaccessible data now at your fingertips


Utilize our API to quickly pull key datasets from your unstructured PDFs.

Feature Comparison

Free

1,000 requests per month
 50 news searches per month
Visual online extractor
Automatic IP rotation, no JS rendering
1 concurrent requests

Hobby

30,000 requests per month
500 news searches per month
Visual online extractor
Automatic IP rotation, JS rendering*
5 concurrent requests

Professional

100,000 requests per month
2,000 news searches per month
Visual online extractor
Automatic IP rotation, JS rendering*
10 concurrent requests

Business

250,000 requests per month
5,000 news searches per month
Visual online extractor
Automatic IP rotation, JS rendering*
15 concurrent requests
News searches are throttled at 1 request per second and use the News Search endpoint.
*Rendering JavaScript costs 5 requests instead of the normal 1 request per URL.

Now test-drive these features for yourself.