Clay automates the three major steps of data preparation:
1) Get your data: accessing the format and location,
2) Know your data: understanding what is in your data, and
3) Transform your data: making it usable for analysis.
To gain access to Clay, sign up here.
Or read more below to see how you can use Clay to sculpt your raw data into enhanced, usable data.
Clay is an intelligent data pipeline system. With Clay, you can stream data, converting it automatically between a wide range of formats, sources and destinations. Along with conversion, Clay utilizes deep learning algorithms to recognize and identify dates and addresses in your data. It then can transform your data by standardizing dates, segmenting addresses, filtering columns, and limiting data input to make your data ready for analysis efficiently and effectively.
Clay allows you to access, translate, and transport your data from a variety of locations stored in a variety of formats.
For example, Clay allows you to access and read in a dataset in CSV located in Amazon S3 and have it be translated into and stored on a PostgreSQL database.
The following formats of data that are supported:
- CSV (.csv)
- JSON (.txt, .json)
Your data can be located in any one of the following locations and have it be accessed and delivered by Clay as well. Read more below on the specifics of every data store that is supported.
Access all your files hosted on Amazon S3 securely.
Socrata is the most trusted and widely used government data store. Load datasets directly hosted on OpenData by Socrata.
Connect to your PostgreSQL database as a source or a destination
Clay supports the transformation of MongoDB NoSQL database tables.
Clay accesses Datalogue's proprietary, ontology-mapping, deep learning algorithms to find and classify the following categories in your data:
Given an example dataset:
|John Adams||1600 pennsylvania avenue||october 30, 1735|
Will be classified as the following:
|N||(DTL) Classification - N||A||(DTL) Classification - A||D||(DTL) Classification - D|
|John Adams||full_name*||1600 pennsylvania avenue||address||october 30, 1735||date|
Clay offers only a limited classification of dates and addresses*. Learn more about the different ontology classifications including Pharma, Finance, and even custom provided by Datalogue by contacting us.
There are four different types of transformations on your data that Clay supports:
- Standardize dates
- Parse addresses
- Filter columns
- Limit data intake
Translate all dates into a standardized format (YYYY/MM/DD)
- "October 30, 1735" ➡️ 1735/10/30
- "4/16/2017" ➡️ 2017/04/16
- "8 juillet 1994 ➡️ 1994/07/08
Segment out addresses into the following parts:
- Street Address
- Zip Code
|Original Address||(DTL) Street Address||(DTL) City||(DTL) State||(DTL) Zip Code|
|625 Avenue of the Americas, New York, NY 10011 USA||625 Avenue of the Americas||New York||NY||10011|
Select only the columns of interest in datasets for a more relevant output.
Limit the amount of data rows to intake from sources for faster processing.
If you have ideas or feedback feel free to reach out.
Ready to get started? Gain access to Clay, sign up here.
Learn more about extending Clay's capabilities for unlocking your enterprise's data at Datalogue.