The White Whale team posing for a socially distanced photo in Crescent Heights


Joining Datasets Seamlessly with DeepSea

With the latest release of DeepSea, it is now possible to join or merge data from within our platform, with no coding needed. This will allow data scientists and anyone interested in getting more out of their data to gain insights by bringing together multiple data sources into a single view. DeepSea can take in data from a variety of sources such as files, scheduled database pulls, model runs, and surveys – once that data has been connected to DeepSea, it’s as simple as a few clicks to join any two datasets together.

For example, one could join the results of a model run to historical data or new data, to view how the predictions match up with the data. Or, for example, you could join the results of a survey to existing company data to match responses to information on the relevant client, customer, product, or date. Because it’s easy to upload any dataset available online into DeepSea, you could also merge your business data with public data on things like weather, stock prices, neighbourhoods, etc. 

Joining data within DeepSea also allows your data analysis to easily be improved over time: as data changes or new data is available, data can be merged in with the existing setup and then used to create new visualizations or insights. And, if one or both of the original datasets are updated, the joined table will show all of the latest data automatically.

On the technical side, the joins in DeepSea work the same way that developers and data scientists already know from SQL or Pandas: there are two datasets, joined on certain columns, in either a full join, inner join, or left join. DeepSea’s interface walks users through this so that anyone can create a simple join, but also contains the technical terms that people experienced with joining data know. 

A simple example that illustrates how joining works is joining plane flight data with weather data. For example, an airline or airport might be interested in how the weather is affecting their operations, and weather data is often freely available. After importing both the flight data and the weather data into DeepSea, we have the following 2 datasets:

A dataset detailing UK flights in DeepSea.
A dataset detailing UK airport weather in DeepSea.

The shared columns between these datasets are the date, and the location of the airport where the flight is departing.

A joined dataset can be created to bring all of this data into a single table. From there, we can build dashboards & charts with it, pass it into a model, or export it as if it were a single table. To do this, we only need to specify a few things, and DeepSea will handle the rest. The first choice is which two datasets to join. Then, we just need to choose the columns that specify one unique row in our merged data (which must also be present in both datasets). 

DeepSea's interface for joining datasets together.

In this example, we choose the column for date and location from each table, and pair them together.

We chose Left Join, rather than the default option of Full Join, because we want to keep only the weather data that matches one of our airports. We can also change the join settings at any point later on.

Here is the resulting dataset, and some visualizations from it.

The left-joined dataset detailing UK flights and weather data in DeepSea.
A DeepSea dashboard illustrating UK flights and weather insights.

We are excited to release this new functionality, and look forward to seeing what it empowers DeepSea users to build!

Read More