Slowly Changing Dimensions in Teradata

Slowly changing dimensions or SCD are the dimensions that change over time. It called “slowly” because the change is expected to come rarely or rather I shall say not frequently. Let’s understand this with one real life example. You went to a coffee shop and ordered some coffee and snacks. The cashier asked for your mobile number and name. Assuming that you are the first time customer an entry will be made in CUSTOMER table in the data warehouse. It…

Continue Reading

6 ways to load data file into Teradata table

Loading Data files into Teradata table is one of the most common action required in TD datawarehouse environment. In this post, we will discuss about 6 ways of loading data file into teradata table. Also we will talk about pros/cons of each method. We will use below table for all the examples mentioned below: create multiset table usa_states ( state_id byteint, state_name varchar(100), state_abbr char(2), state_capital varchar(100), state_larg_city varchar(100), state_population int ) primary index (state_id); We will load following data…

Continue Reading

Teradata Interval Field Overflow

Interval is the datatype in Teradata which represents duration between 2 time components. We can add INTERVAL to DATE/TIMESTAMP column values to get to new DATE/TIMESTAMP. Also whenever we subtract two timestamp in Teradata, output is of INTERVAL type. Just like any other data type INTERVAL too has permissible range and if the value exceeds it we get “Interval Field Overflow” error. Now you know the real reason of Interval Field Overflow error , let us see few examples which…

Continue Reading

Connect to Teradata from Spark

I was working on a project recently which involved data migration from Teradata to Hadoop. Most of the data migration was done using sqoop. However there was a requirement to do a quick lookup into Teradata table and do some processing in Spark. For this, I wanted to use Spark as it involves comparing data in Teradata table with HIVE table. So I connected Teradata via JDBC and created a dataframe from Teradata table. Next I created a dataframe from…

Continue Reading

Teradata -GROUPING SET with Example

In Data-warehouse, Grouping means aggregating values to calculate some meaningful measures. In EDW, we have fact tables which holds measures and calculate some aggregates like SUM, MAX, MIN, AVG etc. In most of the cases, we will use GROUP BY clause however in some cases you may want to calculate measures over different axis. It can be single column or multiple columns combined. This is very useful when you are creating data cubes where each dimension will show you different…

Continue Reading

How to create Apache Zeppelin-Teradata Interpreter and use charts?

Apache Zeppelin is a multi-purpose software that can be used for data visualization, data ingestion, data discovery and data analytics. You can check the below mentioned URL to know more about it and to download the software. https://zeppelin.apache.org/ In download section, I’ll suggest to download the “Binary package with all interpreters” which provides number of interpreters that can be used to connect to various sources. In today’s tutorial, we will see: How to create interpreter to be used for Teradata…

Continue Reading

3706: Syntax error: ORDER BY is not allowed in subqueries.

As per SQL standards we cannot use ORDER BY clause in subqueries. It also does not make sense to use ORDER BY inside a derived table.Let’s see an SQL example to identify the problem and how to overcome it. This is a perfectly fine query: SELECT E_EMPNO,E_EMPNAME,E_MOB FROM EMPLOYEE ORDER BY 1,2; Now if we create a derived table using this query it will fail. SELECT COUNT(1) FROM ( SELECT E_EMPNO,E_EMPNAME,E_MOB FROM EMPLOYEE ORDER BY 1,2 )TBL1; Error–> 3706: Syntax…

Continue Reading

Calculate hourly resource consumption using DBQL in Teradata

Recently Raghav reached out to us on our Facebook page asking for a SQL query in Teradata to identify the highest AMPCPU, I/O and SPOOL usage for each hour during a given time period. He was actually working on some activity to find out the consumption of resources on Teradata box on hourly basis. The purpose was to find out time range when Teradata is most heavily loaded and when it is relatively free. He was looking for information in…

Continue Reading

What,Why and How of Encryption in Teradata – Basics and Examples with Syntax

What is encryption ? Why we need encryption ? How do we implement encryption ? One must clearly know answer to these 3 questions before actually implementing encryption in any datawarehouse environment. Encryption is the process by which we change the actual meaningful value to some other meaningless value so that any person who reads it cannot understand it. For example if I say “VGTCFCVC” it won’t make any sense to you however if I share actual value “TERADATA” then…

Continue Reading

How to load multiple files into one single Table using TPT in Teradata?

We recently received a comment on our previous TPT post,(Click here to read it) requesting for a sample TPT script to load multiple data files into one single Teradata Table using Load operator.The TPT script is simple and is very much similar to our TPT script we shared in earlier posts. The main component which will change is the PRODUCER operator. Here we’ll use 2 new variables : VARCHAR FileList = ‘YES’, VARCHAR FileName = @File1 FileList : represent here…

Continue Reading
ROWS-DISTRIBUTION-IN-TERADATA

HASHROW, HASHBUCKET, HASHAMP in Teradata

How Teradata distribute rows ? It is very important to understand how Teradata actually distribute rows among AMPs. We all know that PRIMARY INDEX is used for data distribution. All the AMPs in the Teradata system maintains a portion of each table and this forms the basis of ‘parallel architecture and share nothing‘ architecture of Teradata. We will see how exactly it happens and what all hash functions are used to determine the respective AMP which will hold the table.…

Continue Reading