AI Powered Scraper
- Schedule new Async job with public API
To create a new scraping job:
- Endpoint:
POST /job/async
- Authentication: API KEY required.
- Request Body: JSON object describing the job (see example below)
- Response: A job object with a unique identifier
Example Request:
Imagine you want to scrape Airbnb property details and you need to do it fast and effortlessly.
{
"website": [
"https://www.airbnb.mx/rooms/614647815820366766?adults=1&category_tag=Tag%3A5348&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1469109200&search_mode=flex_destinations_search&check_in=2025-01-02&check_out=2025-01-07&source_impression_id=p3_1731446946_P3i3xN0xoBN5Knh1&previous_page_section_name=1000&federated_search_id=8a49831a-1edd-4289-b7ce-9ca7f91d90ce"
],
"prompt": "List me all property information like title, number of beds, number of baths, price per night, cleaning fee, Airbnb service fee, all amenities (What this place offers).",
"frecuency": { // when type is set to daily, weekly, monthly, hourly type is no needed
"value": "",
"type": "montly"
}
}
Here are the key components:
website
: The target URL to scrape in array. (In this endpoint is possible to attach multiple urls to be attended asynchronously)prompt
: descriptive instruction in English about which data need to be extracted from website parameter.frequency
: Determines how often the job should run. Options include:- daily: Runs once a day.
- weekly: Runs once a week.
- monthly: Runs once a month.
- hourly: Runs once an hour.
- every_x_minutes: Runs every specified number of minutes.
- every_x_hours: Runs every specified number of hours.
- every_x_days: Runs every specified number of days.
- custom: Allows for a custom schedule.
Note: every_x_minutes, every_x_hours, every_x_days and custom options are not yet prepared programmatically, but we expect to receive a cron schedule expression for each option.
Example Response:
The API responds with the created job details.
{
"actions": null,
"created_at": "2024-10-25 02:55:46",
"data_to_scrape": null,
"frecuency": {
"type": "montly"
},
"id": "exampleid",
"status": "not_initialized",
"updated_at": "2024-10-25 02:56:11",
"website": "https://www.airbnb.mx/rooms/614647815820366766?adults=1&category_tag=Tag%3A5348&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1469109200&search_mode=flex_destinations_search&check_in=2025-01-02&check_out=2025-01-07&source_impression_id=p3_1731446946_P3i3xN0xoBN5Knh1&previous_page_section_name=1000&federated_search_id=8a49831a-1edd-4289-b7ce-9ca7f91d90ce"
}
-
Then you can get your data by provided job_id in previous response with Data endpoints, for example:
-
Endpoint:
POST /data/job/{job_id}
- Authentication: API KEY required.
- Request Body: No body needed
- Response: A JSON array of objects with data collected related to a job execution
Example Response:
The API responds with the created job details.
[
{
"created_at": "2024-10-25 02:56:11",
"data": {
"title": "AlpineCabin -Views! Outdoor shower, hot tub, more!",
"beds": "1",
"baths": "1",
"price_per_night": "$734",
"cleaning_fee": "$115",
"airbnb_fee": "$534",
"amenities": [
"Mountain View",
"Kitchen",
"WiFi",
"Free parking on premises",
"TV",
"Bathtub",
"Dedicated Workspace",
"Private Hot Tub",
"Air Conditioning",
"Exterior security cameras on property",
]
},
"html_url": "https://urltodownloadhtmlexample.com/",
"id": "dataid",
"jobid": "jobid"
}
]