Skip to content

AI Powered Scraper

  1. Schedule new Async job with public API

To create a new scraping job:

  • Endpoint: POST /job/async
  • Authentication: API KEY required.
  • Request Body: JSON object describing the job (see example below)
  • Response: A job object with a unique identifier
Example Request:

Imagine you want to scrape Airbnb property details and you need to do it fast and effortlessly.

{
  "website": [
    "https://www.airbnb.mx/rooms/614647815820366766?adults=1&category_tag=Tag%3A5348&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1469109200&search_mode=flex_destinations_search&check_in=2025-01-02&check_out=2025-01-07&source_impression_id=p3_1731446946_P3i3xN0xoBN5Knh1&previous_page_section_name=1000&federated_search_id=8a49831a-1edd-4289-b7ce-9ca7f91d90ce"
  ],
  "prompt": "List me all property information like title, number of beds, number of baths, price per night, cleaning fee, Airbnb service fee, all amenities (What this place offers).",
  "frecuency": { // when type is set to daily, weekly, monthly, hourly type is no needed
    "value": "", 
    "type": "montly" 
  }
}

Here are the key components:

  1. website: The target URL to scrape in array. (In this endpoint is possible to attach multiple urls to be attended asynchronously)
  2. prompt: descriptive instruction in English about which data need to be extracted from website parameter.
  3. frequency: Determines how often the job should run. Options include:
  4. daily: Runs once a day.
  5. weekly: Runs once a week.
  6. monthly: Runs once a month.
  7. hourly: Runs once an hour.
  8. every_x_minutes: Runs every specified number of minutes.
  9. every_x_hours: Runs every specified number of hours.
  10. every_x_days: Runs every specified number of days.
  11. custom: Allows for a custom schedule.

Note: every_x_minutes, every_x_hours, every_x_days and custom options are not yet prepared programmatically, but we expect to receive a cron schedule expression for each option.

Example Response:

The API responds with the created job details.

{
    "actions": null,
    "created_at": "2024-10-25 02:55:46",
    "data_to_scrape": null,
    "frecuency": {
        "type": "montly"
    },
    "id": "exampleid",
    "status": "not_initialized",
    "updated_at": "2024-10-25 02:56:11",
    "website": "https://www.airbnb.mx/rooms/614647815820366766?adults=1&category_tag=Tag%3A5348&children=0&enable_m3_private_room=true&infants=0&pets=0&photo_id=1469109200&search_mode=flex_destinations_search&check_in=2025-01-02&check_out=2025-01-07&source_impression_id=p3_1731446946_P3i3xN0xoBN5Knh1&previous_page_section_name=1000&federated_search_id=8a49831a-1edd-4289-b7ce-9ca7f91d90ce"
}
  1. Then you can get your data by provided job_id in previous response with Data endpoints, for example:

  2. Endpoint: POST /data/job/{job_id}

  3. Authentication: API KEY required.
  4. Request Body: No body needed
  5. Response: A JSON array of objects with data collected related to a job execution
Example Response:

The API responds with the created job details.

[
  {
    "created_at": "2024-10-25 02:56:11",
    "data": {
        "title": "AlpineCabin -Views! Outdoor shower, hot tub, more!",
        "beds": "1",
        "baths": "1",
        "price_per_night": "$734",
        "cleaning_fee": "$115",
        "airbnb_fee": "$534",
        "amenities": [
            "Mountain View",
            "Kitchen",
            "WiFi",
            "Free parking on premises",
            "TV",
            "Bathtub",
            "Dedicated Workspace",
            "Private Hot Tub",
            "Air Conditioning",
            "Exterior security cameras on property",
        ]
    },
    "html_url": "https://urltodownloadhtmlexample.com/",
    "id": "dataid",
    "jobid": "jobid"
  }
]