Tabular data & Time series analysis

Deep Neural Networks are both reliable and effective for making predictions on tabular data. Before some practitioners regarded Random Forests as the best technique for tabular data analysis 99% of the time.

Currently, the best performing techniques in the machine learning area for tabular data regression and classification tasks are widely regarded as Random Forests, Gradient Boosting machines, K Nearest Neighbours with older techniques such as Support Vector Machines, which suffer from the curse of dimensionality, finally starting to be used less.

There are many tabular data analysis tasks that a deep neural network model can be trained to perform:

 

  • Fraud detection

  • Sales forecasting

  • Product failure prediction

  • Pricing

  • Credit risk

  • Customer retention/churn

  • Recommendation systems

  • Ad optimisation

  • Anti-money laundering

  • Resume screening

  • Sales prioritisation

  • Call centre routing

  • Store layout

  • Store location optimisation

  • Staff scheduling

 

What normally do at DSE while working with tabular data can be briefly descrived as follows:

 

Feature engineering

Most current thinking in machine learning is to use feature engineering to preprocess your data to remove features, sometimes to make assumptions about the features the practitioner thinks are in the data. People used to Classic statistics, used to removing parameters.

Feature engineering is still be needed when using deep neural networks for tabular data, albeit much less. The feature engineering that is required needs much less maintenance. Ideally with Tabular data analysis with Neural Networks features aren’t removed, all of the data can be kept and augmented.

Some features may need to be carefully reviewed, as to whether they may discriminate, see the ethics section further below in this article.

Categorical and continuous variables

The data will have categorical and continuous variables. Continuous variables are numbers like such as age or weight, they have an infinite number of values between any two values. Categorical variables are those that have a selection from a discrete group, for example marital status or breed of dog.

Continuous data can be fed into the Neural Network as numbers in the same way as you would pixel values into a Deep Neural Network.

Feature preprocessing

Training a deep neural network will not do all of the required feature engineering on its own, this will find non-linearities and interactions between the features.

Where transforms would be used in image based data, instead preprocessors are used to process the tabular data in advance, once, in advance of training.

This preprocessing should include filling missing data. For continuous, data the missing values can be replaced with the median for the data set. It is also important for the Neural Network to be aware the feature was missing for that data row. A new feature can be added to indicate there was a missing value for that feature in that row, as in itself this could be valuable information. This prevents the missing feature value from skewing the predictions whilst remaining aware that the row is missing data for a feature,

Continuous variables can be normalised by subtracting the feature’s mean and divide by the feature’s standard deviation to make between 0 and 1. This makes it easier for the neural network to train.

The preprocessing that is applied to the training set must be applied to the validation and test sets in the same way.

Embeddings for categorical variables

For each categorical variable a trainable set of matrix of weights can be created, with a row for each category/class in the categorical variable. These matrices are known as embeddings. The result of this embedding matrix multiplied by a one hot encoded vector representing the category/class for the data row is then used as an input into the Neural Network. These are trained to become a set of biases for each category/class within each categorical variable.

 

 

 

Gartner says that most organizations evolve through five levels of maturity in their journey with data. 

 

How can Data Science Enterprise help your organization Level Up?

Basis introduced maturity model, DSE can help businesses not just understand their individual gaps and strengths but to assess organizational maturity in data and evaluate their capabilities across these five dimensions. We often do this through questionnaires or interviews with key technology and business stakeholders. 

The main focus lies with below 5 dimensions:

Vision – The clarity and focus needed to set goals for data science initiatives in the long term. The extent to which these goals align with larger organizational business strategies.

Planning – Translation of data science goals into execution plans and a robust short and long-term roadmap. How to carefully pick the individual initiatives for impact and plan them out with milestones.

Execution – Implementation of the planned data science initiatives by assembling the right data science teams, tools, and processes. Access to pertinent, good quality data that is sourced, transformed and stored effectively. Ability to identify actionable insights by applying the right level of analytics. Enabling consumption of insights through data storytelling.

Value Realization – Adoption of data science initiatives across the organization. Planning for actionability across milestones with robust measurement of ROI.

Data Culture – Scaling of data initiatives across the organization. Promoting data literacy across all teams to enable users to make decisions using data.

 

Would like to get more info? 

Simply get in touch!

Solutions

AI consulting

ChatBot assistants

Computer vision

Financial scoring

NLP, LLM and RAG

Miltech

Prompt engineering

Scientific research

Sports betting (iGaming)

Tabular data and time series

About us

We transform your data  to make it serve you best! 

Our core values:

 

Innovation

Excellence                                                             Equity

Customer Centricity

Looking for an idea?

Kindly consult our Publications page (Blog) to get some inspiration by reading about applications of our products and services  or popular use cases.
 
 
 
 Alternatively, connect on social networks,
 or simply get in touch:

Every day, new happy customers

8

Services

40

Users

30+

Conducted researches