The Goal

Sometimes you just want to see the data, but the data that you are looking at wasn't designed to be read (leaving you both confused and frustrated).

Clay helps you solve this.

Clay lets you transform data into different formats in different location, and allows you to enhance your understanding of what is in the data. Clay finds addresses and dates within your data, tags those fields and regularizes themโ€”making that data not only easier to "see", but also to use.

Here we'll walk you through an example tutorial of transforming a the contents of a PostgreSQL database into a .CSV file located on Amazon S3.

Technical Prerequisites

Please ensure that you have the technical requirements in the Technical Usage Documentation and full access to the Clay Docker container.

Specifically, you'll need to have:

  • Clay Docker container get access here
  • Postgresql credentials and a sample Postgres database table
  • Amazon S3 credentials

Let's get into it!

Run Clay Docker Container

Let's get Clay running! The Clay Docker Container default runs locally on port 8081.

docker run -itp 8081:8081 -v /tmp quay.io/datalogue/clay

Accessing the Interface

Use your browser to visit http://localhost:8081/ui and access our super simple graphical interface.

Get: Source Data Store - Postgres

Now that we're up and running, the first step is setting up your data pipeline is to specify your source data store: we are assuming a locally running Postgres database. You may read more about accessing your PostgreSQL URL connection here.

The formatting of this section for the Request will look like the following:

  "source": {
    "_type": "Jdbc",
    "user": "postgres",
    "password": "1",
    "url": "jdbc:postgresql://host:port/database",
    "schema": "public",
    "rootTable": "tutorialtable1"
  }

Know: Classify

The formatting of this section of the request will look like the following:

{
  "_type": "Classify"
}

Transform: Standardize dates, parse addresses, and filter relevant columns

Have your data outputted into prepped format with the powerful transformations here.

  • Standardize dates

    By selecting Standardize dates, the data points classified as Dates by the Classify section will be automatically formatted into the generic isoform for dates of YYYYMMDD

The formatting of this section of the request will look like the following:

{
  "_type": "Standardize"
}
  • Parse addresses

    By selecting Parse addresses, the data points classified as Addresses by the Classify section will be automatically segmented into four new columns: (DTL) Street Address, (DTL) City, (DTL) State, (DTL) Zip, and (DTL) Country.

    The formatting of this section of the request will look like the following:

{
  "_type": "Parse"
}
  • Filter columns

    Enter the column names of the data source that you would like to include in your output.

The formatting of the request will look like the following: Request:

      "transformations": [
        {  
          "_type": "ColumnSelection",
           "columns": ["name", "dob", "address"]
        }
      ]

Deliver Target Destination: Amazon S3

Finally, specify where and how you want to data to be delivered. Enter in the specifications for your destination store type.

The formatting of this section of the request will look like the following: Request:

      "target": {
        "_type": "S3",
        "clientId": "XXXX",
        "clientSecret": "YYYY",
        "region": "us-east-1",
        "bucket": "bucket_name",
        "key": "output.csv"
      }

Sculpt!

You've now filled everything and all that's left is to sculpt your data!

Your final request will look like this:

Request:

{
  "source": {
    "_type": "Jdbc",
    "user": "postgres",
    "password": "1",
    "url": "jdbc:postgresql://host:port/database",
    "schema": "public",
    "rootTable": "tutorialtable1"
  },
  "pipelines": [
    {
      "transformations": [
        {
          "_type": "Classify"
        },
        {
          "_type": "Parse"
        },
        {
          "_type": "Standardize"
        },
        {  
          "_type": "ColumnSelection",
           "columns": ["name", "dob", "address"]
        }
      ],
      "pipelines": [],
      "target": {
        "_type": "S3",
        "clientId": "XXXX",
        "clientSecret": "YYYY",
        "region": "us-east-1",
        "bucket": "bucket_name",
        "key": "output.csv"
      }
    }
  ]
}

To try this using cURL:

## Postgres -> S3
curl -X "POST" "http://localhost:8081/run" \
     -H "Content-Type: application/json" \
     -d $'{
  "pipelines": [
    {
      "target": {
        "_type": "S3",
        "clientSecret": "XXXX",
        "clientId": "YYYY",
        "region": "us-east-1",
        "bucket": "dtl0",
        "key": "output.csv"
      },
      "transformations": [
        {
          "_type": "Classify"
        },
        {
          "_type": "Parse"
        },
        {
          "_type": "Standardize"
        },
        {
          "_type": "ColumnSelection",
          "columns": [
            "name",
            "dob",
            "address"
          ]
        }
      ],
      "pipelines": []
    }
  ],
  "source": {
    "_type": "Jdbc",
    "user": "postgres",
    "password": "1",
    "rootTable": "tutorialtable1",
    "url": "jdbc:postgresql://host:port/database",
    "schema": "public"
  }
}'

๐ŸŽ‰ And that's it! ๐ŸŽ‰

Within six steps you were able to unlock, translate, enhance, and transform your data for use for your coworker.

You can use Clay to do the tedious data preparation for you so you can get to your analytics faster.

Have any questions or feedback? We'd love to hear from you at contact@datalogue.io