Please refer to the Clay Wiki for an overview of what Clay can do.

Installation

Technical Requirements

Accessing the Docker Container

After gaining access to Clay, run the following command in your terminal and enter your Quay credentials.

# May need to be run with sudo in Linux
docker login quay.io

Running the Docker Container

The Clay Docker Container default runs locally on port 8081.

docker run -itp 8081:8081 -v /tmp quay.io/datalogue/clay

Quick Start

There are a variety of ways to start using Clay for your data transformation needs.

Clay is set up using a RESTful API interface where you may POST the specified Source, Transformation, and Destination information of your data prep pipeline.

Below you can find some of the most popular ways to interact with Clay, but you can use whatever your favorite API interaction tool is 🖥️

You can also follow our tutorial to help you get started.

Follow Tutorial

Clay GUI

The Clay Docker Container has a graphical user interface that can be accessed at http://localhost:8081/ui.

Paw/Postman Collections

  • Paw Project to open a sample set of Clay data pipeline requests in a Paw Project.
  • Postman Import this Postman Collection automatically by clicking the above link.

cURL

curl -X POST "http://localhost:8081/run" -H "Content-Type: application/json" -d "$body"

Read more in the Usage Patterns Section below on how to structure $body of your POST Request.

Usage Patterns

Below you will find the specifications of how to structure your request.

Find more information for what the parameters are for:

Header

Content-Type: application/json

Request Body

{
  "source": {
    "_type": "$sourceKind",
    "field": "value"
  },
  "pipelines": [
    {
      "transformations": [
        {  
          "_type": "$transformationKind",
           "field": "value"
        }
      ],
      "pipelines": [],
      "target": {
        "_type": "$targetKind",
        "field": "value"
      }
    }
  ]
}

Source & Target

For the following data stores, whether acting as a source or target destination, will require the following information:

Running Locally

Docker has a known issue of being unable to connect to locally run data stores.

The best workaround for Mac users is specifying: docker.for.mac.localhost in any instance that localhost is called. Otherwise it will call the Docker container's local. Another alternative is also running your database in the Docker container.

Amazon S3

S3 may be used as both a Source and a Destination.

The following credentialing information are required:

  • Access Key ID
  • Secret Access Key
  • Bucket Region
  • Bucket Name
  • File Name
{
  "_type": "S3",
  "clientId": "AKIAIOSFODNN7EXAMPLE",
  "clientSecret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "region": "us-east-1",
  "bucket": "test",
  "key": "input.csv"
}

Read the Amazon documentation for more information on where to receive your Access Keys and Secret Access Keys.

OpenData by Socrata

Socrata may only be used as a Source.

The following credentialing information are required:

  • Token
  • Host Domain
  • Data Resource Id
{
  "_type": "Socrata",
  "token": "GPGuyRELzwEXtRJbJ-EXAMPLE",
  "domain": "data.cityofchicago.org",
  "id": "ypez-j3yg"  
}

Read the Socrata Open Data API documentation for more information on how to receive your Application Token.

PostgreSQL

PostgreSQL databases may be used as both a Source and a Destination.

The following credentialing information are required:

  • Username
  • Password
  • URL
  • Schema
  • Root Table name

Read more about specifying your PostgreSQL database URL here.

If there is no security credentials to your PostgreSQL, simply leave those categories blank.

{
  "_type": "Jdbc",
  "user": "test",
  "password": "1",
  "url": "jdbc:postgresql://host:port/database",
  "schema": "public",
  "rootTable": "simple"
}

MongoDB

MongoDB may be used as both a Source and a Destination.

The following credentialing information are required:

  • Username
  • Password
  • URL
  • Database name
  • Collection name

Read more about specifying your MongoDB URI here.

If there is no security credentials to your MongoDB instance, simply leave those categories blank.

{
  "_type": "Mongo",
  "user": "test",
  "password": "1",
  "url": "mongodb://host:port",
  "database": "test",
  "collection": "simple"
}

Transformations

There are four types of transformations that are available through Clay.

Classify

Classify uses Datalogue's deep learning algorithms to identify Dates and Addresses in your data.

{
  "_type": "Classify"
}

Standardize Dates

Standardize Dates into a YYYY/MM/DD format

{
  "_type": "Standardize"
}

Parse Addresses

Parse addresses that were found by the Classifier into: address, city, state, zip, and country.

{
  "_type": "Parse"
}

Filter Columns

ColumnSelection: Specify the top level columns you want to keep in the dataset

{  
    "_type": "ColumnSelection",
    "columns": ["COLUMN NAME"]
}

Limit Data Input

Specify the number of observations you want from the dataset

{  
    "_type": "ElementCountSelection",
    "count": 3
}

Still have questions? Check out our tutorial to help you get started.