If you have any problems or requests please contact support.

Introduction

The 80Legs API is set up to be RESTful. The goal was to develop an API that has intuitive and logical end points and use the HTTP response codes as a way to indicate API errors. By utilizing the standard HTTP header fields, we are able to provide a robust experience that can be interpreted using any available HTTP client.

Resources

Overview

Resource Usage
users This resource allows for the viewing of user data.
apps This resource allows for the uploading of 80app files.
urllists This resource allows for the uploading and viewing of URL lists.
data This resource allows for the uploading and viewing of data input files for crawls.
crawls This resource allows for the creation and cancelation of crawls. It also allows the user to view the crawl status and settings. When the crawl is complete the links of the results will also be provided within the crawl.
results Results can also be viewed by using the result links and metadata can also be viewed by using the results resource.

End Points

https://api.80legs.com/v2/

Resource URL Patterns

Authentication

Usage

# Authenticate with HTTP Basic
import requests
import json

s = requests.Session()
s.auth = ('API_TOKEN', '')

s.get('https://api.80legs.com/v2/crawls/<crawl_id>')

# Or by using the authorization header
import requests
import json

s = requests.Session()
s.headers.update({'Authorization': '<encoded_api_token>'})

s.get('https://api.80legs.com/v2/crawls/<crawl_id>')
# Authenticate with HTTP Basic
require 'rest-client'

api = RestClient::Resource.new("https://api.80legs.com/v2", "<your_api_token>", "")
api.get "/crawls/<crawl_id>"

# You can pass the authorization header to your REST client gem
require 'faraday'

api = Faraday.new(:url => "https://api.80legs.com/v2") do |req|
  req.headers['Authorization'] = <encoded_api_token>
end

api.get "/crawls/<crawl_id>"
# With shell, you can pass your token appended with a colon as the user option
curl "https://api.80legs.com/v2/crawls/crawl_id" -u your_api_token:

# you can also just pass the correct header with each request
curl "https://api.80legs.com/v2/crawls/crawl_id" -H "Authorization: <encoded-api-token>"

We use API Tokens to authenticate to the 80Legs API. After registering for a plan you should have recieved an API Token, your token can be viewed from the “My Account” page on the 80Legs webapp. The API Token allows use of all available credits in your account, make sure to keep it secret.

The API uses HTTP Basic Auth for authentication. The API Token is used as the username and the password is left blank.

The user name can be placed before the domain as in the example below or it can be set in the header. If set in the header the token must be first base64 encoded (with the colon at the end), then passed in as a basic authentication, see example below. Some HTTP libraries (such as the Java library), require that the authentication be set in the appropriate header fields, and throw it out if placed before the domain.

View in Browser

https://your_api_token:@api.80legs.com/v2/crawls/<crawl_id>

Authentication : "Basic your_api_token:"

Users

Get User Data

require 'rest-client'
require 'json'

# replace <your_api_token> with your API token
response = RestClient.get("https://<your_api_token>:@api.80legs.com/v2/users/<your_api_token>")
p JSON.parse(response)
curl "https://api.80legs.com/v2/users/<your_api_token>" -u your_api_token:
# the final : after your_api_token is required
import requests
import json

r = requests.get('https://<your_api_token>:@api.80legs.com/v2/users/<your_api_token>')
user_object = json.loads(r)

Example Response JSON Object

{
  "token":"api_token",
  "organization":"Datafiniti",
  "email":"user@datafiniti.net",
  "first_name":"John",
  "last_name":"John Doe",
  "phone_number":"555-555-5555",
  "plan_id":"basic",
  "active": true,
  "urls_crawled":"1134738",
  "date_registered":"2013-07-09T21:56:01Z"
}

End point

GET https://api_token:@api.80legs.com/v2/users/{USER_API_TOKEN}

Returns all user data pertaining to that user in a JSON format.

Common Codes

HTTP Code Description
200 The user data was returned successfully
401 The API token is not authenticated to view this user
404 The API token provided does not exist
503 The API is currently down.

Attributes

Key Type Value
token string The api token associated with the user
organization string The organization associated with the user if provided.
email string The email associated with the user.
first_name string The first name of the user if provided.
last_name string The last name of the user if provided.
phone_number string The phone number of the user if provided.
plan_id string The plan associated with this user.
active boolean If the user currently has an active plan.
urls_crawled integer The total number of URLs crawled.
date_registered date The date the account was first created, this date will be used to determine when more credits are added on a monthly basis.

Apps

GET All Apps

require 'rest-client'
require 'json'

response = RestClient.get("https://#{api_token}:@api.80legs.com/v2/apps")
all_apps = JSON.parse(response)
import requests
import json

response = requests.get('https://<api_token>:@api.80legs.com/v2/apps')
all_apps = json.loads(response)
curl "https://api.80legs.com/v2/apps" -u your_api_token

Example JSON Object

[
    {
  "name":"app1",
  "user":"user_token_1234",
  "location":"apps/user_token1234/app1",
  "date_created":"2013-07-09T21:56:01Z"
    },
    {
        "name":"app2",
        "user":"user_token_1234",
        "location":"apps/user_token1234/app2",
        "date_created":"2013-07-10T21:56:01Z"
    },
    {
        "name":"app3",
        "user":"user_token_1234",
        "location":"apps/user_token1234/app3",
        "date_created":"2013-07-11T21:56:01Z"
    }
]

Endpoint

GET https://api_token:@api.80legs.com/v2/apps

Returns all apps associated with this user in a JSON object.

Common Codes

HTTP Code Description
200 The apps were returned successfully
401 The API token is not authenticated to view these Apps
503 The API is currently down.

Attributes

Key Type Value
name string The name given to the app.
user string The user the app is associated with.
location string The internal location for the app.
date_created date The date the app was uploaded.

Upload an App

curl -X PUT https://your_user_token:@api.80legs.com/v2/apps/full_page_content.js -H "Content-Type: application/octet-stream" --data-binary @/path/to/full_page_content.js -i
require 'rest-client'

app_file = File.read('/path/to/app')

RestClient.put("https://<your_api_token>:@api.80legs.com/v2/apps", app_file, {:content_type => 'application/octet-stream'}) do |response|
  return response.code
end
import requests

url = "https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>"
files = {'file': open('custom_80_app.js', 'rb')}
headers = {'content-type': 'application/octet-stream'}

r = requests.put(url, files=files, headers=headers)

Endpoint

PUT https://api_token:@api.80legs.com/v2/apps/{APP_NAME}

Uploads the contents of the PUT body with the given name. The file can be no longer than 1MB. The app must be a valid javascript file.

Common Codes

HTTP Code Description
204 The list of apps was returned successfully.
401 The API token is not authenticated to post apps
415 Content-type is not set / set improperlty.
503 The API is currently down.

Content Type

content-type : application/octet-stream

Get a specific app

require 'rest-client'
require 'json'

response = RestClient.get("https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>")
app = JSON.parse(response)
import requests
import json

request = requests.get("https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>")
app = json.loads(request)

End Point

GET https://api_token:@api.80legs.com/v2/apps/{APP_NAME}

Downloads the app with the given name.

Common Codes

HTTP Code Description
200 The app was returned successfully.
401 The API token is not authenticated to get this file.
404 No app with that name exists.
503 The API is currently down.

Delete an App

curl --request DELETE "https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>"
require 'rest-client'

RestClient.delete("https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>") do |response|
    return response.code == 204
end
import requests

requests.delete("https://<your_api_token>:@api.80legs.com/v2/apps/<app_name>")

End Point

DELETE https://api_token:@api.80legs.com/v2/apps/{APP_NAME}

Deletes the app with the given name.

Common Codes

HTTP Code Description
204 The app was removed successfully.
401 The API token is not authenticated to delete this file.
404 No app with that name exists.
503 The API is currently down.

URL Lists

Get all URL Lists

curl -X GET https://your_user_token:@api.80legs.com/v2/urllists/ -H "Content-Type: application/octet-stream"

Example JSON Object

[
    {
        "name":"urls1",
        "user":"user_token_1234",
        "location":"urllists/user_token1234/urllists1",
        "date_created":"2013-07-09T21:56:01Z"
    },
    {
        "name":"urls2",
        "user":"user_token_1234",
        "location":"urllists/user_token1234/urllists2",
        "date_created":"2013-07-10T21:56:01Z"
    },
    {
        "name":"urls3",
        "user":"user_token_1234",
        "location":"urllists/user_token1234/urllists3",
        "date_created":"2013-07-11T21:56:01Z"
    }
]

End Point

GET https://api_token:@api.80legs.com/v2/urllists

Returns all url lists associated with this user in a JSON object.

Common Codes

HTTP Code Description
200 The URL lists were returned successfully.
401 The API token is not authenticated to view these files.
503 The API is currently down.

Attributes

Key Type Value
name string The name given to the data file.
user string The user the data file is associated with.
location string The internal location for the data file.
date_created date The date the data file was uploaded.

Upload URL List

curl -X PUT https://your_user_token:@api.80legs.com/v2/urllists/name_of_url_list -H "Content-Type: application/octet-stream" --data-binary "[\"http://www.example.com/\", \"http://www.sample.com/\", \"http://www.test.com/\"]" -i
require 'rest-client'

urllist_file = File.read('/path/to/urllist')

RestClient.put("https://<your_api_token>:@api.80legs.com/v2/urllists", urllist_file, {:content_type => 'application/octet-stream'}) do |response|
  return response.code
end
import requests

with open('your_url_list.txt') as data_file:
  data = json.load(data_file)

url = "https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>"
headers = {'content-type': 'application/octet-stream'}

r = requests.put(url, json=data, headers=headers)

Example JSON Object

[
    "http://url_example1.com/2123213",
    "http://url_example2.com/adsfsdf",
    "http://url_example3.com/adsfsdf",
    "http://url_example4.com/adsfsdf"
]

Formatting

End Point

PUT https://api_token:@api.80legs.com/v2/urllists/{URLS_NAME}

Uploads the contents of the PUT body with the given name. The file can be no longer than 5 MB. The list of URLs must be provided in a JSON format.

Common Codes

HTTP Code Description
204 The URL lists was posted successfully.
401 The API token is not authenticated to post URL lists.
415 Content type is not set / set incorrectly.
503 The API is currently down.

Content Type

content-type : application/octet-stream

Get specific URL List

curl -X GET https://your_user_token:@api.80legs.com/v2/urllists/url_list_name -H "Content-Type: application/octet-stream"
require 'rest-client'

response = RestClient.get("https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>")
urllist = JSON.parse(response)
import requests
import json

request = requests.get("https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>")
urllist = json.loads(request)

End Point

GET https://api_token:@api.80legs.com/v2/urllists/{URLS_NAME}

Downloads the data file with the given name.

Common Codes

HTTP Code Description
200 The URL lists was returned successfully.
401 The API token is not authenticated to view this file.
404 This URL list does not exist.
503 The API is currently down.

Delete a URL List

curl -X DELETE "https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>"
require 'rest-client'

RestClient.delete("https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>") do |response|
    return response.code == 204
end
import requests

requests.delete("https://<your_api_token>:@api.80legs.com/v2/urllists/<urllist_name>")

End Point

DELETE https://api_token:@api.80legs.com/v2/urllists/{URLS_NAME}

Deletes the data file with the given name.

Common Codes

HTTP Code Description
204 The URL lists was deleted successfully.
401 The API token is not authenticated to delete this file.
404 This URL list does not exist.
503 The API is currently down.

Crawls

View all crawls

curl -X GET https://your_user_token:@api.80legs.com/v2/crawls
require 'rest-client'

puts RestClient.get 'https://<api_token>:@api.80legs.com/v2/crawls'
import requests

r = requests.get("https://<api_token>:@api.80legs.com/v2/results")
print r.content

End Point

GET https://api_token:@api.80legs.com/v2/crawls

Returns an array of all crawls associated with your API token.

Optional Parameters

Name Description
status An array of crawl statuses to filter results by

STARTED, COMPLETED, CANCELED, __QUEUED_

Create a crawl

curl -X PUT https://your_user_token:@api.80legs.com/v2/crawls/name_of_crawl -H "Content-Type: application/json" -d "{\"app\": \"full_page_content.js\", \"urllist\": \"name_of_url_list\", \"data\": \"name_of_data_file\", \"max_depth\": 1, \"max_urls\": 10 }" -i
require 'rest-client'

user_token = "123asd123asd"
crawl_name = "New_Crawl_Name"
options = {
        "app": "app_name",
        "urllist": "url_list_name",
        "data": "data_file_name",
        "max_depth": 1,
        "max_urls": 10
    }

RestClient.put "https://#{user_token}:@api.80legs.com/v2/crawls/#{args[:crawl_name]}", options.to_json, {content_type: :json} do |response, request|
      p request
      p response
      @status = response.code
end

if @status < 400
    puts "Created crawl!"
else
    puts "Crawl creation failed!"
end
import requests
import json

url = 'https://<api_token>:@api.80legs.com/crawls/<crawl_name>'
payload = {"app": "app_name", "urllist": "url_list_name", "data": "data_file_name", "max_depth": 1, "max_urls": 10 }
headers = {'content-type': 'application/json'}

r = requests.post(url, data=json.dumps(payload), headers=headers)

Example Sent JSON Object

    {
        "app": "app_name",
        "urllist": "url_list_name",
        "data": "data_file_name",
        "max_depth": 1,
        "max_urls": 10
    }

Example Returned JSON Object

    {
        "name": "crawl_name",
        "app": "app_name",
        "user": "user_token_1234",
        "id": "1234",
        "urllist": "url_list_name",
        "data": "data_file_name",
        "user_agent": "008",
        "depth": 0,
        "max_depth": 10,
        "urls_crawled": 0,
        "max_urls": 10000000,
        "status": "STARTED",
        "date_created": "2015-1-1 12:00:00"
        "date_started": "2015-1-1 13:00:00"
        "date_completed": "2015-1-1 14:00:00"
        "results": []
    }

End Point

POST https://api_token:@api.80legs.com/v2/crawls/{CRAWL_NAME}

Creates a crawl with the parameters specified in the body of the POST message. The crawl settings are returned as a JSON object.

Common Codes

HTTP Code Description
204 The crawl was created successfully.
400 The parameters are not valid JSON.
401 The API token is not authenticated to create crawls or the crawl falls outside of allowed parameters. This includes over the maximum Depth value, over the maximum URL value, or not enough crawl credits.
415 Content type is not set / set incorrectly.
422 There was an issue with the parameters. The issue is either missing some required parameters or the parameters are set to incorrect or non existant values (e.g. the App specified has not been uploaded).
503 The API is currently down.

Content Type

content-type : application/json

Required Parameters

Name Description
app The name of the app to be used to process the page contents. This app must have been uploaded before using the app resource and must be associated with this user account.
urllist The name of the URL list to be used to start the crawl. This URL list must have been uploaded before using the urllists resource and must be associated with this user account.
max_depth The max depth you want the crawl to reach.
max_urls The max number of URLs you want the crawl to process.

Optional Parameters

Name Description
data The name of the data file to be passed to the crawler app. This data file must have been uploaded before using the data resource and must be associated with this user account.

Get crawl status

curl -X GET https://your_user_token:@api.80legs.com/v2/crawls/name_of_crawl
require 'rest-client'

puts RestClient.get 'https://<api_token>:@api.80legs.com/v2/crawls/<crawl_name>'
import requests

r = requests.get("https://<api_token>:@api.80legs.com/v2/results/<crawl_name>")
print r.content

Example Returned JSON Object

    {
        "name": "crawl_name",
        "app": "app_name",
        "user": "user_token_1234",
        "id": "1234",
        "urllist": "url_list_name",
        "data": "data_file_name",
        "user_agent": "008",
        "depth": 0,
        "max_depth": 10,
        "urls_crawled": 0,
        "maxUrls": 10000000,
        "status": "STARTED",
        "date_created": "2015-1-1 12:00:00"
        "date_started": "2015-1-1 13:00:00"
        "date_completed": "2015-1-1 14:00:00"
        "results": ["http://s3.amazonaws.com/results1"]
    }

End Point

GET https://api_token:@api.80legs.com/v2/crawls/{CRAWL_NAME}

Returns the settings and status of the crawl. This will include links to the results when they are posted. If the crawl doesn’t exist, this will return an empty array.

Common Codes

HTTP Code Description
200 The crawl data was returned successfully.
401 The user is not authenticated to view this crawl.
503 The API is currently down.

Cancel a crawl

curl -X DELETE "https://<api_token>:@api.80legs.com/v2/crawls/<crawl_name>"
import requests
requests.delete "https://<api_token>:@api.80legs.com/v2/crawls/<crawl_name>"
RestClient.delete "https://<api_token>:@api.80legs.com/v2/crawls/<crawl_name>"

End Point

DELETE https://api_token:@api.80legs.com/v2/crawls/{CRAWL_NAME}

Cancels the crawl with the given name.

Common Codes

HTTP Code Description
204 The crawl was successfully canceled.
401 The user is not authenticated to cancel this crawl.
404 The crawl does not exist.
503 The API is currently down.

Results

View results for a crawl

import requests
r = requests.get("https://<api_token>:@api.80legs.com/v2/results/<crawl_name>")
>>> print r.status_code
>>> print r.headers
>>> print r.content
curl --request GET "https://<api_token>:@api.80legs.com/v2/results/<crawl_name>"
require 'rest-client'

api_token = "abcdefghi123457"
crawl_name = "Amazing_Crawl"
results = RestClient.get("https://#{api_token}:@api.80legs.com/v2/results/#{crawl_name}")

Example Returned JSON Object

["http://s3.amazonaws.com/results1"]

End Point

GET https://api_token:@api.80legs.com/v2/results/{CRAWL_NAME}

Returns the results of the crawl specified by CRAWL_NAME. This will return a 404 if no results have been posted.

Common Codes

HTTP Code Description
200 The result links were returned successfully.
401 The user is not authenticated to view these results.
404 No results have been posted.
503 The API is currently down.

Errors

HTTP Status Code Summary

HTTP Code Status Meaning
200 OK Request was successful
204 OK Request was successful, but no data was returned (This is expected behavior)
400 Bad Request Often times a missing or incorrect parameter
401 Unauthorized Invalid/Missing API Token
404 Not Found Using an invalid API end point, or the user supplied path is incorrect
422 Unprocessable Entity All the parameters were correct but the request was rejected on the back end, contact support with request information
523 Service Unavailable The API is currently unavailable due to maitenance, try again later.
5xx Server Error There was an error within the Datafiniti servers