Teradata is a relational database management system. It is widely used by large organizations to store and manage large amounts of data. Teradata is designed to be highly scalable, reliable, and fast. It is also designed to be secure and support high data availability.
Teradata enables businesses to store, access, and analyze data quickly and easily. It can be used for data warehousing and business intelligence applications. Teradata is used to store customer information, sales data, financial data, and other types of data.
To get started with Teradata, it is important to understand the basic components of the system, such as the database, tables, views, and indexes. It is also important to learn about the SQL language, which is used to interact with the system.
Once you have a basic understanding of the system, you can begin to explore the features and capabilities of Teradata. For example, you can learn how to create and manage databases, tables, views, and indexes. You can also learn how to query data and create reports.
In addition, you can learn how to use the Teradata tools and utilities to automate tasks and increase efficiency. Finally, you can learn how to use the Teradata Security and Auditing features to ensure that data is secure and compliant with regulations.
Audience
This tutorial is designed for those who want to learn the basics of the Teradata database and its associated technologies. This tutorial is suitable for both beginners and professionals. It covers topics such as database architecture, SQL, architecture and optimization, data analysis, and performance tuning. This tutorial is also useful for database administrators, software developers, and technical consultants who want to gain a comprehensive understanding of the Teradata database.
Prerequisites
This Teradata Tutorial assumes that you have a basic understanding of relational databases and the Structured Query Language (SQL). You should also have an understanding of data warehousing concepts and the different types of data warehouse architectures. Additionally, you should have a working knowledge of the Teradata database and its features. If you do not have these skills, please review our Introduction to SQL and Introduction to Data Warehousing tutorials before proceeding with this Teradata Tutorial.
Teradata – Introduction
Teradata is a relational database management system (RDBMS) developed by Teradata Corporation. It is designed to manage large-scale data warehousing operations using a massively parallel processing (MPP) architecture. Teradata is one of the most popular and widely-used RDBMSs in the world. It is used by many of the world’s largest companies to store and analyze massive amounts of data. Teradata is well-suited for use in data warehouses, business intelligence (BI) projects, and data analytics applications. It offers a wide range of features, including support for complex queries, distributed query processing, scalability, and high availability. Teradata is also known for its extensibility and flexibility, allowing customers to customize its features to meet their particular needs.
History of Teradata
Teradata was founded in 1979 by a group of researchers from the California Institute of Technology and the University of California, San Diego. The company’s first product was an early version of the Teradata Database, which was a relational database management system (RDBMS) designed to run on mainframe computers. The Teradata Database allowed users to quickly and efficiently store, query, and analyze large amounts of data.
In the 1980s, Teradata began to expand its product offerings to include other data management solutions, such as data warehousing, data mining, and analytics.
In the 1990s, the company began to focus on developing its core database technology, which resulted in the release of the Teradata Database V2R5 in 1998. This version of the database was the first to be able to run on multiple platforms, including Windows, Unix, and Linux.
In the 2000s, Teradata began to focus on developing its analytics capabilities. This included the release of the Teradata Warehouse Miner, which allowed users to analyze large amounts of data in an efficient and cost-effective manner.
In recent years, Teradata has continued to expand its product offerings, including the release of the Teradata Aster Database and the Teradata Data Warehouse Appliance. These products have allowed companies to better manage Big Data and gain valuable insights from it.
Features of Teradata
1. Scalability: Teradata is highly scalable and can easily accommodate increasing data volumes and growing user demands. It can be scaled up or down as required and is capable of processing data in an efficient manner.
2. User-friendly: Teradata is designed to be user-friendly and provides a high level of support to users. It is easy to use and provides a wide range of features and functions to help users manage and analyze their data.
3. High Availability: Teradata ensures high availability and is capable of delivering a high level of performance even when the system is under heavy load. It is designed to meet the needs of enterprise customers who require continuous availability of their data.
4. Security: Teradata provides a secure and reliable platform for data storage and processing. It is designed to protect customer data from unauthorized access and malicious activities.
5. Advanced Analytics: Teradata offers advanced analytics and predictive capabilities for users. It is capable of delivering insights and recommendations based on the data stored in its databases.
Teradata – Installation
Installing Teradata is a complex process that requires specialized expertise. It is recommended that you engage a Teradata Professional Services consultant to assist with the installation.
The installation process includes the following steps:
1. Download and Pre-installation Setup: The first step is to download the Teradata software, unzip the file, and run the Pre-installation Setup. This will check the system and network requirements, install any necessary prerequisites, and configure the system for Teradata.
2. Install Teradata Software: The next step is to install all the Teradata components and configure the system to meet your specific requirements. This includes installing the Teradata server, client tools, and other components.
3. Create Data Warehouse: After all the components are installed, the next step is to create the data warehouse. This includes creating the database and tables, loading the data, and configuring the security settings.
4. Test and Deploy: Once the data warehouse is created, it is important to test it to ensure that it is functioning properly. This includes running queries, checking performance, and verifying security settings. Once the data warehouse is tested and approved, it can be deployed to production.
5. Maintenance: The last step is to set up a maintenance plan for the Teradata system. This includes patching, performance monitoring, and other tasks to ensure that the system is running optimally.
Teradata – Architecture
Teradata is a massively parallel processing (MPP) database management system (DBMS) developed by Teradata Corporation. It is widely used for enterprise data warehousing and analytics applications.
The Teradata architecture consists of three main components: the Teradata Database, Teradata Parsing Engine (PE), and Teradata Access Module Processors (AMP).
The Teradata Database is the main component of the system, which stores and processes large amounts of data. It is composed of multiple nodes that are interconnected via a communication link. Each node contains multiple AMPs, which are responsible for data storage and processing.
The Teradata Parsing Engine (PE) is a software-based component that parses SQL queries and distributes them to the different AMPs. The AMPs then process the queries and return the results to the PE.
Finally, the Teradata Access Module Processors (AMPs) are the actual processors that store and process the data in the Teradata Database. They are responsible for the actual data manipulation and retrieval.
Components of Teradata
1. Teradata Database: This is the core component of the Teradata system and provides a highly scalable relational database management system (RDBMS) with advanced features such as parallelism, scalability, and high availability.
2. Teradata QueryGrid: This component enables the connection of multiple databases of different types, allowing users to query and join data from multiple data sources at once.
3. Teradata Viewpoint: This component provides a web-based user interface for managing the Teradata system and monitoring performance.
4. Teradata Data Mover: This component allows for the movement of data between Teradata systems, as well as between other databases.
5. Teradata Studio: This component is an integrated development environment for SQL developers and data analysts to build, debug, and deploy SQL applications.
6. Teradata Tools and Utilities: This component provides a suite of tools and utilities for managing and optimizing the Teradata system.
Storage Architecture
Teradata uses a shared-nothing architecture for its storage. In this architecture, each node in the system is completely independent and operates on its own set of disks. This architecture allows for parallel processing of queries, as each node is operating independently and is able to process its own portion of the query. It also allows for scalability, as the system can be increased in size by simply adding more nodes. Additionally, it provides fault tolerance, as the system will continue to operate even if one or more of the nodes fail. Finally, it ensures data integrity, as each node is responsible for its own data and will not be affected by the failures of other nodes.
Storage Architecture
Teradata utilizes a Massively Parallel Processing (MPP) architecture which divides processing power across multiple nodes. Each node has its own processor, memory, and disk storage. Data is distributed across the nodes, and the nodes communicate with each other to process data in parallel. Teradata is a shared-nothing architecture, meaning that no node shares resources with any other node. This makes it highly scalable and allows for efficient processing of large amounts of data. Additionally, Teradata’s Table Level Locking feature ensures that data can be accessed and updated reliably, even in a multi-user environment.
Retrieval Architecture
The Teradata architecture consists of a shared-nothing MPP (massively parallel processing) architecture which uses multiple nodes to process multiple requests simultaneously. The system is composed of a set of nodes connected via a high-speed interconnect. Each node contains multiple processors, disk storage, and memory, and is responsible for processing its own portion of a query. The nodes communicate with each other using a message-passing protocol.
The nodes are organized into a tree-structured hierarchy, with the root node at the top. This structure allows for efficient communication between the nodes, as each node is only responsible for communicating with its immediate neighbors.
Data is stored on disk in a distributed fashion, with each node containing a portion of the data. As queries are processed, the data is retrieved from the various nodes and sent to the requesting node. The data is then combined and returned to the requesting node.
The system is optimized for high-speed data retrieval by using a combination of techniques such as parallel query execution, data replication, and indexing. Indexing helps to reduce the amount of data that needs to be read from disk, while replication helps to ensure that data is available in multiple nodes in case of a node failure.
Teradata – Relational Concepts
1. Relations: A relation, also known as a table, is an organized set of data in a database. It consists of columns and rows, where each column is a field, and each row is a record.
2. Primary Keys: A primary key is a column, or set of columns, that uniquely identifies each row of data in a table. A primary key must contain unique values and must not contain null values.
3. Foreign Keys: A foreign key is a column, or set of columns, that is used to establish a link between two tables. A foreign key in one table points to a primary key in another table.
4. Indexes: An index is a database structure that is used to speed up the retrieval of data from a table. Indexes can be clustered or non-clustered, and can be created on one or more columns in a table.
5. Views: A view is a virtual table that is based on the result set of a query. Views can be used to hide data from users, limit data access, and simplify complex queries. Views can also be used to join multiple tables together.
6. Transactions: A transaction is a set of operations that must either all succeed or all fail. Transactions are used to ensure data integrity and to maintain the consistency of a database. They can be used to ensure that multiple data operations occur in an atomic manner.
Database
A database is an organized collection of data, generally stored and accessed electronically from a computer system. Databases are typically used to store information that can be rapidly retrieved and updated. Common uses of databases include storing customer information, product catalogs, employee records and financial data.
Tables
A relational database management system (RDBMS) is a type of database that stores and manages data stored in separate tables. Each table stores data related to a specific subject, and each row in the table represents an individual record. Tables in a relational database are composed of columns and rows.
The columns in a table are referred to as fields and each field has a data type associated with it, such as text, numeric, date, or Boolean. The rows in a table are referred to as records and each record stores a single instance of data. Each record in a table must have a unique identifier, such as a primary key, that allows the record to be identified and retrieved.
Relationships between tables can be established by creating foreign keys. A foreign key is a field in one table that references the primary key of another table. This relationship allows the data in one table to be linked to the data in the other.
Tables can also be linked through views. A view is a virtual table that is created by combining data from two or more tables. Views allow users to easily access data from multiple tables without having to create a complex query.
Columns
Data stored in an RDBMS is organized into columns and rows. Columns are the fields in a table and each column contains a specific type of data, such as numbers, text, dates, or binary data. Rows are the records or instances of data within the table. Each row contains a unique combination of values for the columns, which can be used to identify that particular row.
Row
A row is a horizontal entity in a relational database that contains one or more columns, which store data values. In a database table, each row represents a unique record and contains information related to a single entity, such as a person, product, or event.
Primary Key
The primary key is a unique identifier for each record that is used to identify and differentiate the records from each other. It is usually a single field or combination of fields that uniquely identify the records. A primary key can be a single field such as an ID number, or it can be multiple fields such as a combination of first name, last name, and birthdate.
Foreign Key
A foreign key is a column or set of columns in a relational database table that is used to link to or reference a row of data in another table. It is a type of constraint used to maintain the referential integrity of data between two related tables. A foreign key is a field in one table that is linked to the primary key of another table. The purpose of the foreign key is to ensure data in the related tables is consistent and correctly linked.
Teradata – Data Types
Teradata supports the following data types:
• Numeric: Integer, Decimal, Numeric, Float, Bigint, Smallint, Byteint
• Character: Character, Varchar, Char, Varbinary
• Date/Time: Date, Time, Timestamp
• Special Types: Interval, Geospatial, Large Object
• Collection Types: ARRAY, MULTISET, VARRAY
Teradata – Tables
Teradata tables are database objects that contain all of the data in a Teradata database. Tables are organized into columns and rows, with each column containing a specific type of data. A table is the basic unit of data storage in a Teradata database and is the foundation for all data manipulation and retrieval operations. Tables can be created, modified, and deleted in Teradata.
Tables are the fundamental building blocks of a Teradata database. Each table must have a unique name that identifies it within the database. Tables are made up of columns and rows. Columns contain the attributes of the data in the table, and rows contain the actual data values. The data in a table is organized in a specific order, which is determined by the order of the columns.
Tables can be linked to one another using primary and foreign keys. Primary keys are used to uniquely identify records in a table, while foreign keys link one table to another. This allows for the creation of complex relationships between data.
Tables can also be joined together in a variety of ways. Joins are used to combine data from multiple tables into a single result set. Joins can be used to combine information from different tables, or even from different databases.
Tables can be used to store data in a structured format, or as unstructured data such as text. Unstructured data is often stored in a text field in a table. The text field can be used to store large amounts of data, such as web pages, while the structured data is used to store information in a more organized manner.
Teradata – Data Manipulation
1. Insert – Used to add new rows to the table.
2. Update – Used to modify the existing records in the table.
3. Delete – Used to delete records from the table.
4. Select – Used to retrieve records from the table.
5. Merge – Used to combine data from two or more tables into a single table.
6. Truncate – Used to delete all the records from a table.
7. Create Table – Used to create a new table.
8. Alter Table – Used to modify the structure of an existing table.
9. Drop Table – Used to delete a table from the database.
10. Union – Used to combine the results of two or more SELECT statements.
Teradata – SELECT Statement
The SELECT statement is used to retrieve data from a database.
Syntax:
SELECT column1, column2, …
FROM table_name
WHERE condition;
Example:
SELECT first_name, last_name
FROM employee
WHERE department = ‘IT’;
Teradata – Logical and Conditional Operators
Logical Operators
• AND – The AND operator is used to evaluate two or more expressions that must all be true for the entire expression to be true.
• OR – The OR operator is used to evaluate two or more expressions where only one expression must be true for the entire expression to be true.
• NOT – The NOT operator is used to reverse the value of an expression. If the expression is true, NOT will make it false; if the expression is false, NOT will make it true.
Conditional Operators
• = – The equal operator is used to compare two values for equality.
• <> – The not equal operator is used to compare two values for inequality.
• > – The greater than operator is used to compare two values to determine if one is greater than the other.
• < – The less than operator is used to compare two values to determine if one is less than the other.
• >= – The greater than or equal operator is used to compare two values to determine if one is greater than or equal to the other.
• <= – The less than or equal operator is used to compare two values to determine if one is less than or equal to the other.
Teradata – SET Operators
SET Operators are used to combine the results of two or more SELECT statements into a single result.
There are four types of SET Operators in Teradata:
1. UNION: combines the results of two or more SELECT statements into a single result, eliminating duplicates.
2. INTERSECT: returns only the rows that are common to the results of two or more SELECT statements.
3. EXCEPT: returns only the rows from the first SELECT statement that are not in the results of the second SELECT statement.
4. MINUS: same as EXCEPT, but not supported by Teradata.
SET Operators can be used to combine the results of multiple SELECT statements into a single result set. They can also be used to compare the results of two SELECT statements and return only the matching rows.
Teradata – String Manipulation
Teradata provides various string manipulation functions to work with string data types such as VARCHAR and CHAR. These functions are used to perform operations such as finding the length of a string, replacing characters, concatenating strings, extracting substring, and other operations.
Some of the commonly used string manipulation functions in Teradata are:
– SUBSTRING: This function is used to extract a substring from the given string.
– POSITION: This function is used to find the position of a substring within a given string.
– CHAR_LENGTH: This function is used to find the length of a given string.
– CONCAT: This function is used to combine two strings into a single string.
Teradata – Date/Time Functions
1. CURRENT_TIMESTAMP: Returns the current date and time from the system clock.
2. CURRENT_DATE: Returns the current date from the system clock.
3. EXTRACT (date_part FROM date_expression): Returns the specified date part of the date expression. Date parts include year, month, day, hour, minute, second, etc.
4. ADD_MONTHS: Adds a specified number of months to a date expression.
5. DATE: Converts an expression to a date.
6. FORMAT: Formats a date expression according to the specified format.
7. CAST: Converts an expression of one data type to another.
8. LAST_DAY: Returns the last day of the month for the specified date.
9. TIME: Returns the current time.
10. DATEDIFF: Returns the difference between two dates.- TRIM: This function is used to remove leading and trailing spaces from a given string.
Teradata – Built-in Functions
Teradata is a relational database management system (RDBMS) that is used for data warehousing and analytics. It provides a set of built-in functions that allow users to more easily process and analyze data stored in their databases. These built-in functions include:
• Aggregate Functions: These functions allow users to summarize the data stored in their databases. Examples include COUNT, SUM, AVG, MIN, and MAX.
• String Functions: These functions allow users to manipulate strings stored in their databases. Examples include SUBSTRING, REPLACE, and UPPER/LOWER.
• Date and Time Functions: These functions allow users to manipulate dates and times stored in their databases. Examples include ADD_MONTH, YEAR_DIFF, and CURRENT_TIMESTAMP.
• Mathematical Functions: These functions allow users to perform calculations on the data stored in their databases. Examples include ROUND, SIGN, and MOD.
• Conversion Functions: These functions allow users to convert data from one data type to another. Examples include CAST and TO_CHAR.
• Analytic Functions: These functions allow users to perform more complex analysis on the data stored in their databases. Examples include RANK, FIRST/LAST_VALUE, and LAG/LEAD.
Teradata – Aggregate Functions
Aggregate functions are a type of SQL functions that allow users to perform calculations on the data in a database. They are commonly used for performing mathematical calculations such as sum, count, average, and other statistical operations. These aggregate functions are used to summarize and analyze the data in a table or view.
Aggregate functions are used to reduce the number of rows in a table. They can be used to calculate the total, count, average, minimum and maximum values in a table. Aggregate functions can also be used to combine values from different rows into a single value. For example, the SUM() function can be used to calculate the total value of all the records in a table.
The most commonly used aggregate functions are COUNT(), SUM(), MAX(), MIN(), and AVG(). The COUNT() function is used to count the number of records in a table. The SUM() function is used to calculate the total value of all the records in a table. The MAX() function is used to find the maximum value in a table. The MIN() function is used to find the minimum value in a table. Finally, the AVG() function is used to calculate the average value of all the records in a table.
In addition to the above mentioned aggregate functions, Teradata also provides additional aggregate functions such as STDDEV(), VARIANCE(), and PERCENTILE_CONT(). The STDDEV() function is used to calculate the standard deviation of the values in a table. The VARIANCE() function is used to calculate the variance of the values in a table. The PERCENTILE_CONT() function is used to calculate the percentile of a given value in a table.
Overall, aggregate functions are useful for performing mathematical calculations and summarizing data in a table or view. They can be used to calculate the total, count, average, minimum and maximum values in a table. They can also be used to combine values from different rows into a single value. Therefore, aggregate functions are essential for analyzing and summarizing data in Teradata.
1. AVG: Returns the average of the values in a given column.
2. COUNT: Returns the number of rows in a given table.
3. MAX: Returns the maximum value in a given column.
4. MIN: Returns the minimum value in a given column.
5. SUM: Returns the sum of all values in a given column.
6. STDDEV: Returns the standard deviation of the values in a given column.
7. VARIANCE: Returns the variance of the values in a given column.
Teradata – CASE and COALESCE
CASE:
The CASE statement is an expression used to evaluate a condition and return a different value based on the result of the condition. The syntax is as follows:
CASE
WHEN condition THEN result
WHEN condition THEN result
ELSE result
END
COALESCE:
The COALESCE function is a statement used to return the first non-null value of a list of expressions. The syntax is as follows:
COALESCE(expression1, expression2, expression3….)
Teradata – Primary Index
The primary index in Teradata is an index that is used by the system to uniquely identify each row in a table. It is the most important index and the one that is used to access data in the table. The primary index can be either a unique single-column index or a composite index made up of multiple columns. Typically, the primary index is a unique value, such as a customer ID number, that is used to identify each individual row in the table.
Unique Primary Index (UPI)
A Unique Primary Index (UPI) is a type of index used to uniquely identify each record in a database table. Each UPI is composed of one or more fields, which are used together to create a unique identifier for each record in the table. The UPI is used to quickly locate records and can be used as a sort order for data retrieval. UPI’s are typically used in relational databases and are often used to enforce data integrity and consistency.
Non Unique Primary Index (NUPI)
A Non Unique Primary Index (NUPI) is a type of primary index in which the same value can be used multiple times as the key for different records. Unlike Unique Primary Index (UPI), NUPI can contain duplicate values as the key for different records. This type of index is useful in situations where records need to be grouped together based on a certain field, such as customer names, product categories, etc. NUPIs are commonly used in relational databases and data warehouse systems.
Teradata – Joins
Teradata provides a variety of join types that can be used to combine data from two or more tables into a single set of results.
Inner Join: An inner join is the most commonly used join type in Teradata. It combines data from two tables based on a common value, such as a primary key.
Left Join: A left join combines data from two tables, but only retrieves rows from the table on the “left” side of the join.
Right Join: A right join combines data from two tables, but only retrieves rows from the table on the “right” side of the join.
Full Outer Join: A full outer join combines data from two tables, retrieving all rows from both tables.
Cross Join: A cross join combines all records from one table with all records from another table, resulting in a Cartesian product.
Semi Join: A semi join combines data from two tables, but only retrieves rows from the “left” table that have corresponding values in the “right” table.
Anti Join: An anti join is the opposite of a semi join. It combines data from two tables, but only retrieves rows from the “left” table that do not have corresponding values in the “right” table.
Teradata – SubQueries
A subquery is a query that is nested inside another query. Subqueries are typically used in the WHERE clause of the outer query to return a set of rows to be used by the outer query. In Teradata, subqueries can be used with SELECT, INSERT, UPDATE, and DELETE statements.
For example, a subquery could be used in a SELECT statement to return a list of employees and their salaries from a table called Employee_Salaries. The outer query could look like this:
SELECT employee_name, salary
FROM Employee_Salaries
WHERE salary IN (SELECT salary
FROM Employee_Salaries
WHERE job_title = ‘Manager’);
This query would return a list of all employees who have the job title of ‘Manager’ and their associated salaries from the Employee_Salaries table.
Teradata – Table Types
Teradata uses the following types of tables:
1. Permanent Tables:
These are the default types of tables in Teradata. Permanent tables are the most commonly used type of tables and are the tables that are used to store data and user-defined objects such as views, macros, and triggers.
2. Global Temporary Tables:
Global Temporary Tables (GTTs) are tables that exist in the database until the user session is terminated or the session is logged out. These tables are used to store temporary data that can be accessed by all the sessions in the Teradata environment.
3. Volatile Tables:
Volatile tables are similar to GTTs, with the difference that they exist only within the current user session. As soon as the user session is terminated, the volatile tables and their data are deleted.
4. Derived Tables:
Derived tables are the result of a query expression. These tables are used to store temporary data that can be used in subsequent queries.
5. Join Indexes:
Join Indexes are used to improve query performance. They are created by joining two or more tables and storing the query results in a separate table.
Teradata – Table Types
A derived table is a virtual table created as a result of a SELECT statement. It is not stored as an independent object, but is instead derived from one or more tables, views, or other derived tables. A derived table can be used in a FROM clause of a SELECT statement in the same way a real table or view can be used.
A volatile table is a type of derived table that is not stored as an independent object and is instead created as a result of a SELECT statement. Volatile tables are typically used when the data in the table may change frequently and the user wants to ensure that they are always viewing the most up-to-date version.
A global temporary table is a type of derived table that exists in a database session. The data in the table is accessible to all users and is stored until the database session is closed. Global temporary tables are typically used when the data in the table needs to be shared between multiple database sessions.
Teradata – Space Concepts
Teradata is a relational database management system that provides a platform for data warehousing and analytics applications. It utilizes a combination of space concepts to store and manage data.
The most basic space concept is a table. A table is a collection of columns and rows that store information. Each column in the table has a name, data type, and length.
In addition to tables, Teradata also utilizes partitions, which are collections of tables that are grouped together. Partitions help to reduce the amount of space needed to store data.
Teradata also incorporates virtual storage, which allows users to access data without actually storing it on disk. This reduces the amount of space used and makes data more accessible.
Finally, Teradata also implements a space management system to ensure that all data is stored efficiently. This system helps to optimize the use of disk space and improve performance.
The Teradata Database provides three types of spaces to store data: permanent, temporary, and spool.
Permanent Space: Permanent space is the space allocated to the user for storing their permanent data. The permanent space is allocated when the database is created and the space can be increased when needed. This space is used to store the permanent tables, indexes, views, and other database objects. The permanent space is permanent and cannot be released once allocated.
Temporary Space: Temporary space is used to store temporary data and objects that are used for short term purposes. This type of space is dynamically allocated and released when not needed. Temporary space is used for operations such as sorting, hashing, and creating temporary tables.
Spool Space: Spool space is an area on disk used by the Teradata Database to store intermediate results from a query. This type of space is used when the results of a query are too large to fit in memory. The spool space is used to store intermediate results until the query is completed and the results are sent to the client.
Overall, the three types of spaces available in Teradata are permanent, temporary, and spool. Permanent space is allocated when the database is created and is used to store permanent objects. Temporary space is used to store temporary data and objects and is dynamically allocated and released when not needed. Finally, spool space is used to store intermediate results from queries.
Teradata – Secondary Index
Teradata Secondary Index (or Secondary Indexes) is a type of index used in Teradata databases. It is used to speed up the retrieval of data from the database. Unlike primary indexes, which are used to identify rows in a table, secondary indexes are used to identify subsets of data within a table. They are used to improve query performance by providing an alternative path to retrieve data. Secondary indexes can be created on any column or combination of columns in a table, and they can be used to support both equality and range queries.
Unique Secondary Index (USI) is an indexing method used in Teradata that makes sure that the values in the indexed column are always unique. It is generally used to speed up the retrieval of data from large tables and to make sure that duplicate values are not present in the table. USI is used for columns that contain unique values such as Social Security numbers, Zip codes, and email addresses.
A Non-Unique Secondary Index (NUSI) is an indexing method used in Teradata that allows duplicate values in the indexed column. It is used to improve the performance of queries that use the column in the WHERE clause. NUSI is used for columns that contain non-unique values such as customer names, product names, and order numbers.
One of the main differences between USI and NUSI is that USI is used to enforce data integrity while NUSI is used to improve query performance. USI ensures that each value in the indexed column is unique and that no duplicate values exist. NUSI is used to improve the performance of queries that use the column in the WHERE clause, but it does not guarantee that all values are unique.
Another difference between USI and NUSI is that USI can only be used on one column at a time, while NUSI can be used on multiple columns in a composite index. USI is used for columns that contain unique values, while NUSI is used for columns that contain non-unique values.
In summary, USI is used to enforce data integrity by making sure that the values in the indexed column are always unique. NUSI is used to improve the performance of queries that use the column in the WHERE clause, but it does not guarantee that all values are unique. USI can only be used on one column at a time, while NUSI can be used on multiple columns in a composite index. USI is used for columns that contain unique values, while NUSI is used for columns that contain non-unique values.
Teradata – Statistics
Teradata Statistics are used to determine the most efficient execution plans for a query. Statistics provide information about the data distribution and other characteristics of the data in the tables and columns used in a query. This information is used by the Teradata Optimizer to select the most efficient query plan and optimize query performance. Statistics can be collected manually or automatically. Automatically-collected statistics are updated whenever data is loaded into the table or modified. Manual statistics can be used to refresh existing statistics or to collect statistics on a table or column that has not been modified.
Teradata – Compression
Teradata supports data compression to help reduce storage requirements and improve query performance. Compression can be configured at the table or column level, depending on the needs of the organization. Compression reduces data size by removing redundant information and compressing the rest. Teradata supports three different compression techniques:
1. Lossless compression: This technique compresses data without sacrificing accuracy, as no data is lost during the compression process. It is best used for data with a lot of repeated values.
2. Predictive compression: This technique uses predictive modeling to create an algorithm that can accurately compress data. It is best used for data with a lot of similar patterns.
3. Hybrid compression: This technique combines lossless and predictive compression techniques and is best used for data with both repeated values and similar patterns.
Compression can have a large impact on query performance and storage requirements, so it should be carefully considered when planning a Teradata implementation.
Limitations
1. Compression ratios can vary significantly depending on the data type and values.
2. Teradata does not support the use of compression for data that exceeds the size of a single data block.
3. Compression is not supported on volatile tables, global temporary tables, and join indexes.
4. Compression can only be used on a single column at a time, meaning that multiple columns with similar data types and values cannot be compressed together.
5. Compression can only be used on a limited set of data types, including CHAR, VARCHAR, BYTE, VARBYTE, and INTEGER.
Multi-Value Compression (MVC)
Multi-Value Compression (MVC) is a data compression technique used to reduce the size of data sets stored in databases. It is a form of data compression that reduces the size of multiple values stored in a single field by encoding them into a single value. The technique works by encoding multiple values into a single representation. This representation may use a variety of techniques, such as bit-level encoding, run-length encoding, or dictionary encoding. MVC can be used to compress a variety of data types, such as integers, floating-point numbers, strings, and dates. MVC can also be used to compress multiple columns of data in a single field. This can result in significant storage savings. Additionally, MVC can improve query performance due to fewer data points to process.
Teradata – Explain
Teradata is a database management system designed to enable companies to store, manage, and analyze large volumes of relational data. It is based on the massively parallel processing (MPP) architecture, which is a type of database system that divides the workload across multiple processors to increase performance and scalability. Teradata is particularly well-suited for data warehouses, as it is designed to handle large amounts of data and complex queries. It also offers a variety of features that help organizations make the most of their data, such as advanced analytics, data mining, and data visualization tools.
Unique Primary Index
A Primary Index (PI) in Teradata is a unique identifier that is used to access data in a table. It is used to quickly locate a row of data in a table, and it is typically the most frequently used index in the database. It is important to define a Primary Index when creating a table in Teradata, as it helps in optimizing query performance. The Primary Index is always a unique value and can be either a single column or a combination of columns.
Unique Secondary Index
A secondary index in Teradata is an index that is created on a table after the primary index. It is used to improve the performance of queries that do not use the primary index. Secondary indexes can be unique or non-unique, and they can be created on one or more columns in a table. Examples of unique secondary indexes in Teradata include:
1. Unique Secondary Index on a Single Column: This type of index is created on a single column and it ensures that no two rows in the table have the same value in the indexed column.
2. Unique Secondary Index on Multiple Columns: This type of index is created on multiple columns and it ensures that no two rows in the table have the same combination of values in the indexed columns.
3. Unique Secondary Index on Derived Columns: This type of index is created on a derived column and it ensures that no two rows in the table have the same value in the derived column.
Teradata – Hashing Algorithm
In Teradata, hashing algorithms are used to create unique values called “hashes” from a set of data. This is done by applying a mathematical algorithm to the data which generates a unique output for each set of data. The most commonly used hash algorithms in Teradata are MD5, SHA-1 and SHA-2. These algorithms are used to generate a hash from any given data. This means that the same set of data will always produce the same hash output, allowing for efficient data lookup and verification.
Teradata – JOIN Index
A join index is an index-organized table that is created for the purpose of improving query performance by allowing the optimizer to access the data directly from the index, rather than having to access the data from the underlying table. A join index can be used to improve the performance of queries that involve a join between two or more tables. It can also be used to improve the performance of queries that involve a join between two or more columns within a single table. Join indexes can be used in conjunction with a variety of other performance-enhancing techniques, such as materialized views, partitioning, and clustering.
Single Table Join Index (STJI): An STJI is a type of database index that is used to improve the performance of a database query that joins multiple tables. The index uses a single table as the basis for its query optimization, allowing the database engine to quickly identify related rows in the other tables.
Multi Table Join Index (MTJI): An MTJI is similar to an STJI, but it uses more than one table as its base. It optimizes the query by identifying related rows in the multiple tables, allowing the database engine to quickly join the data.
Aggregate Join Index (AJI): An AJI is a type of MTJI that aggregates data from multiple tables into a single table before the query is executed. This reduces the number of queries that need to be executed and speeds up the query process.
Teradata – Views
A view in Teradata is a virtual table that contains the result of a query. It has columns and rows just like a real table. The fields in a view are fields from one or more real tables in the database. A view is actually a composition of a table in the form of a predefined SQL query. Views can be used to join and simplify multiple tables into a single virtual table. Views can also be used to restrict access to the data in the underlying tables.
Teradata – Macros
Teradata macros are reusable pieces of code in Teradata SQL, which can be saved and used to automate the execution of common tasks. They are very useful for reducing the time and effort involved in coding and executing the same query multiple times. Macros can contain parameters, which allow the user to pass in values that can be used to customize the query. Macros can also be used to create custom functions or stored procedures.
Create Macros
Macros in Teradata are stored procedures that accept arguments, and return a result set.
To create a macro in Teradata:
1. Log into the Teradata environment.
2. Create the macro stored procedure by entering the following query:
CREATE MACRO macro_name (argument1 datatype, argument2 datatype)
BEGIN
— Insert your stored procedure code here
END;
3. Replace macro_name with the name of your macro, and argument1 and argument2 with the names of the arguments.
4. Replace datatype with the data type of the argument, such as INTEGER or VARCHAR.
5. Insert the code for your stored procedure within the BEGIN and END statements.
6. Execute the query to create the macro.
7. To use the macro, enter the following query:
EXECUTE macro_name (argument1_value, argument2_value);
8. Replace macro_name with the name of your macro, argument1_value and argument2_value with the values of the arguments.
Parameterized Macros
Parameterized macros are preprocessor directives that take parameters and allow a single macro to be reused multiple times. They are sometimes called variadic macros, as they can accept a variable number of arguments. Parameterized macros are written as follows:
#define MACRO_NAME(param1, param2, …) expression
The parameters can then be used in the expression.
Executing Parameterized Macros
To execute a parameterized macro in Teradata, you need to first create the macro using the CREATE MACRO statement. Once the macro is created, you can then execute it using the EXECUTE MACRO statement, passing in the necessary parameters as arguments. For example:
CREATE MACRO
my_macro (IN parameter1 INTEGER, IN parameter2 CHAR(20))
BEGIN
SELECT * FROM my_table
WHERE column1 = parameter1
AND column2 = parameter2;
END;
EXECUTE MACRO my_macro(10, ‘abc’);
Teradata – Stored Procedure
Teradata stored procedures are a powerful feature of the Teradata Database which allow users to write SQL statements and other procedural logic that can be run on the database server. They are designed to allow users to create their own database applications or functions to be used in SQL statements or queries. Stored procedures are written in the Teradata Procedural Language (TPL), a procedural language based on the ANSI SQL/PSM standard. A stored procedure is a compiled program that can be executed by the Teradata Database. It can accept input parameters and return multiple sets of results or even result sets to the calling application. Stored procedures offer a powerful way to extend the functionality of the database by providing custom application logic in the database layer.
Advantages
1. Improved Performance: Stored procedures can improve performance by pre-compiling queries and caching query execution plans. This reduces the time it takes to execute frequently used SQL queries.
2. Improved Security: Stored procedures can help improve security by restricting access to sensitive data or operations. By encapsulating sensitive data or operations within a stored procedure, you can control which users or roles can execute the stored procedure.
3. Reusability: Stored procedures can be reused across multiple applications, reducing the need for unnecessary coding and testing. This can save time and resources during development.
4. Transparency: Stored procedures can help ensure a level of transparency by ensuring all operations (including data manipulation) are performed within the database, rather than in the application code. This can help reduce the risk of data manipulation errors.
Teradata – JOIN strategies
1. Nested Loop Join: This join strategy is used when there is a need to join two tables that are relatively small. It can be used to join two tables in which one of them is indexed.
2. Hash Join: This join strategy is used when two large tables need to be joined. It can be used to join two tables with the same data type.
3. Merge Join: This join strategy is used when two large tables need to be joined, but the tables are sorted on the join column.
4. Cartesian Join: This join strategy is used when two tables need to be joined without any conditions. This join will return every combination of rows in both tables.
5. Semi Join: This join strategy is used when there is a need to join two tables, but only rows from one of the tables are returned.
6. Left Outer Join: This join strategy is used when there is a need to join two tables and return all rows from the left table, regardless of whether they match the rows in the right table.
Teradata – Partitioned Primary Index
A Partitioned Primary Index (PPI) is a type of index used in Teradata databases to speed up queries. It is a variation of the Primary Index (PI) which uses range partitioning to divide a table into multiple partitions. Each partition is then assigned a unique PI value which is used to speed up query performance. The PPI allows queries to quickly locate and access the appropriate partition, thereby reducing the time required to execute the query.
Advantages
1. Better Performance: Partitioned Primary Indexes (PPI) provide faster access to data when compared to traditional Primary Indexes (PI). This is because when data is partitioned, each partition consists of smaller subsets of data, which can be accessed more quickly than larger sets of data.
2. Improved Space Utilization: By partitioning data into smaller subsets, the amount of space required to store the data is reduced. This is because each partition requires fewer bytes of data than a single table.
3. Reduced Overhead: Partitioning data reduces the amount of overhead required to manage and store the data, thus increasing the efficiency of the system.
4. Improved Query Performance: Partitioned Primary Indexes enable the system to use parallelism to process multiple queries simultaneously, resulting in improved query performance.
5. Improved Data Availability: Partitioned Primary Indexes enable the system to store copies of data in different partitions, which increases the system’s ability to recover from failures. This improves data availability.
Teradata – OLAP Functions
Teradata provides a range of OLAP (online analytical processing) functions that can be used to analyze data in a variety of ways. These functions include:
• CUBE: This function allows data to be aggregated by multiple dimensions, such as product, region, and time.
• ROLLUP: This function can be used to create a hierarchical view of data, allowing users to see the data in different levels or groupings.
• RANK: This function can be used to assign a rank to each row of data, based on a specified column or set of columns.
• DENSERANK: Similar to RANK, this function can be used to assign a rank to each row of data, but it takes into account ties (rows that have the same value).
• NTILE: This function can be used to assign a group to each row of data, based on a specified column or set of columns.
• LAG and LEAD: These functions can be used to compare the values of rows in different positions in the result set.
• FIRST and LAST: These functions can be used to return the first or last row of a result set, respectively.
• AVG, MIN, MAX, and SUM: These functions can be used to calculate aggregate values from a result set.
Teradata – Data Protection
Teradata provides data protection through a variety of methods. These include physical security measures, logical security measures, data encryption, and data masking.
Physical security measures help protect data from physical tampering or theft. This includes access control, physical access monitoring and control, and video surveillance.
Logical security measures help protect data from unauthorized access, malicious activity, and other security threats. This includes authentication and authorization, user access control, and data access monitoring.
Data encryption helps protect data in transit and at rest by encoding it so that it is unreadable by any unauthorized parties. This includes using encryption algorithms and key management systems.
Data masking helps protect sensitive data by obscuring it with a placeholder, such as a string of random numbers and letters. This helps ensure that the original data cannot be revealed even if the masking is breached.
Transient Journal
A Transient Journal in Teradata is a special type of session-level temporary table that is used to store the data that is being processed in a transaction. It allows the user to save the state of the data at the time of the transaction, so that if the transaction fails, the data can be rolled back to its original state. Transient Journals are used in conjunction with the Teradata Multilevel Transaction Manager (MTM) to ensure data integrity.
Fallback
The fallback option in Teradata is used to provide an alternate data source if the primary data source is unavailable. This option allows the system to failover to the alternate data source without any disruption to the user. This can be used in cases where a system is down or inaccessible, or when a specific table or set of data is not available on the primary source. The fallback option can also be used to switch to a different data source if the performance of the primary source is not meeting the expected requirements.
Down AMP Recovery Journal
AMP Recovery Journal in Teradata is a feature that helps to increase database availability and reliability. It is a log-based system that records all database changes and allows for efficient recovery of the database in the event of a system failure. The recovery journal can be used to recover from a variety of failure scenarios, such as server or disk crashes, system outages, data corruption, and user errors. The journal is maintained by the Teradata Database Management System and is automatically created as part of the database setup process. The journal contains a record of all database changes, including DDL and DML operations, as well as changes to stored procedures, triggers, and user-defined functions. It also records the state of the database prior to the failure and allows the system to roll back any changes that have been made since the last successful backup.
Cliques
A clique in Teradata is a group of tables that are related to each other and can be joined together to form an interdependent data set. Cliques can be used as a way of organizing data and making it easier to access. Cliques are typically created using primary and foreign key relationships between tables.
Hot Standby Node
In Teradata, a Hot Standby Node is a node that can be used as a backup in case of failure of the primary node. It is configured to constantly mirror the primary node so that if the primary node fails, the Hot Standby Node can take over its operations quickly. This helps to minimize downtime and ensure that the system remains available even in the event of a failure.
RAID
RAID stands for Redundant Array of Independent Disks. It is a technology used to increase data reliability and performance by creating multiple copies of data and storing them on different disks. RAID is used in Teradata to improve the performance and reliability of data storage. RAID can be implemented in various ways, such as RAID 0 (striping), RAID 1 (mirroring), RAID 5 (striping with parity) and RAID 10 (mirroring and striping). Each RAID level has its own advantages and disadvantages, and the best one for a particular system depends on the specific requirements.
Teradata – User Management
Teradata provides a comprehensive user management system which enables users to manage user accounts, permissions, and roles. User accounts can be managed through the User Manager tool, which can be accessed from the System Administration menu. This tool enables users to create, modify, and delete user accounts, as well as assign roles and permissions. The Teradata Database also provides a set of administrative views, which can be used to view user accounts and associated information. Additionally, users can be granted or denied access to specific objects within the database based on their roles and permissions.
Teradata – Performance Tuning
1. Monitor queries to identify and fix bottlenecks
2. Adjust the configuration settings of the Teradata system to optimize performance
3. Utilize the Teradata query optimizer to improve query performance
4. Use stored procedures, views and macros to minimize the amount of code that must be executed
5. Use partitioning, indexing, and materialized views to improve query performance
6. Utilize Teradata’s parallel query and workload management tools to distribute loads
7. Tune the data distribution of tables, to ensure that data is evenly distributed across Teradata’s AMPs
8. Leverage the power of Teradata’s Advanced SQL Engine to improve query performance
9. Utilize the Teradata Performance Monitor to monitor and diagnose performance issues
10. Implement data archiving and purging strategies to reduce the size of data sets and improve query performance
Collect Statistics
1. Collect statistics on all tables and their columns: This is done by running the COLLECT STATISTICS command on the table and its columns. This is used to improve query performance by allowing the optimizer to better choose an execution plan.
2. Collect statistics on all indexes: This is done by running the COLLECT STATISTICS command on the index. This allows the optimizer to better access the data it needs and improves query performance.
3. Collect statistics on all joins: This is done by running the COLLECT STATISTICS command on the join. This helps the optimizer to better choose an execution plan by taking into account the distribution of data across the different tables in the join.
4. Collect statistics on all views: This is done by running the COLLECT STATISTICS command on the view. This allows the optimizer to better choose an execution plan when executing the view.
5. Collect statistics on all macros: This is done by running the COLLECT STATISTICS command on the macro. This allows the optimizer to better choose an execution plan when executing the macro.
6. Collect statistics on all stored procedures: This is done by running the COLLECT STATISTICS command on the stored procedure. This allows the optimizer to better choose an execution plan when executing the stored procedure.
Teradata – FastLoad
Teradata FastLoad is a parallel loading utility used to load large amounts of data into empty Teradata tables. It is used to improve the performance of data loading by using multiple sessions and parallelism. It is used for bulk loading of data into empty tables in Teradata database. FastLoad can load data from flat files, delimited files, or fixed-length files. It can also be used for bulk loading of data into non-empty tables using the INSERT, UPDATE, and UPSERT commands. FastLoad uses a two-phase approach to loading data into the tables. In the first phase, the data is read from the input file and stored in a work table. In the second phase, the data is loaded into the target table.
How FastLoad Works
FastLoad is a data loading utility for Teradata databases. It is designed to quickly and efficiently load large volumes of data into empty tables. It works by creating multiple sessions that each stream data into the target table. The data is read from external files and parsed into the target table in parallel. It can also be used to perform maintenance operations such as deleting, updating, or inserting data into existing tables. FastLoad is typically used when loading large volumes of data into an empty table.
Executing a FastLoad Script
A FastLoad script is used to quickly load large amounts of data into a Teradata database. There are two main parts to a FastLoad script: the control statements and the data. The control statements are written in a text file and specify the target table, source data file, and the error logging and handling options. The data is written in a separate flat file and contains the data that is to be loaded.
To execute a FastLoad script, the script must be saved as a text file and then submitted to the Teradata Database via a BTEQ command line utility. The syntax for the command is:
BTEQ < script_name.txt
Once the script has been submitted, the Teradata Database will execute it and load the data into the specified table. The loading process can take several minutes to several hours, depending on the size of the data being loaded.
FastLoad Terms
FastLoad is a Teradata utility used to quickly load large amounts of data into a Teradata database. It is optimized for high-speed data loading and uses multiple sessions to simultaneously move data from a flat-file source into multiple tables. FastLoad requires the data to be in a specific format and is used when large amounts of data need to be loaded quickly.
Teradata – MultiLoad
Teradata MultiLoad is a high-speed batch-mode utility used to quickly load large volumes of data into empty Teradata tables. It is designed to enable the efficient loading of large volumes of data into an empty table, while maintaining data integrity and minimizing the impact on the existing Teradata system. It can be used to insert, delete and update data in Teradata tables, and can be used to perform multiple operations in a single transaction. MultiLoad also provides a mechanism for deleting duplicate records. It also supports error handling, allowing the loading process to continue if errors occur.
Limitation
MultiLoad has some limitations
1. It is not compatible with all databases.
2. It does not support stored procedures or other complex database operations.
3. It does not support database-level constraints, such as foreign keys.
4. It does not support query optimization or indexing.
5. It does not handle large datasets efficiently.
6. It cannot be used to update existing data in the database.
7. It can only be used to insert new data into the database.
How MultiLoad Works?
MultiLoad is a data loading utility that can be used to quickly and efficiently move data from one system to another. It is optimized for high-volume data transfers between Teradata and other platforms. It is capable of loading, unloading, and manipulating data to and from tables, views, and files. It can also be used to perform data maintenance tasks such as deleting and updating records, and to perform bulk loading operations such as application-to-application transfers. MultiLoad can be used to move data between systems in parallel, making it more efficient and faster than traditional batch loading processes. It also provides built-in error handling and logging capabilities to ensure data integrity and accuracy.
Teradata – FastExport
Teradata FastExport is a high-speed export utility used to extract data from a Teradata Database. It is a client-based utility that runs in a user’s address space and retrieves data from the Teradata Database, then formats and writes it to an output file. It is optimized for large-scale data extraction, with the ability to read up to 64 million rows per second, making it ideal for extracting large volumes of data quickly and efficiently. FastExport also supports multiple sessions, allowing it to extract data from multiple tables in parallel and speed up the process even further.
Executing a FastExport Script
1. Open the Teradata SQL Assistant program.
2. Select the FastExport icon from the top toolbar.
3. Enter the name of the script in the Script File Name field.
4. Select the appropriate database from the drop-down menu.
5. Set the output format and destination by filling out the relevant fields.
6. Run the script by pressing the run button.
7. Monitor the progress of the script using the progress bar at the bottom of the window.
8. Once the script has finished, review the log file to check for errors.
9. If the script ran without errors, the output will have been created as specified in the output fields.
FastExport Terms
1. Fast Export – Fast Export is a software program developed by IBM to quickly and securely transfer large amounts of data between IBM mainframe computers and other computer systems. It is commonly used to transfer data between geographically dispersed systems, enabling users to share information quickly and securely.
2. Mainframe – A mainframe is a type of computer system typically used by large organizations for critical operations and data processing. Mainframes are designed to be more powerful and reliable than other computer systems, and have the capacity to process large amounts of data quickly and efficiently.
3. Data Transfer – Data transfer is the process of moving data from one system to another. Data can be transferred between computers, mobile devices, and other forms of digital storage. Data transfer techniques include physical media, network connections, and cloud storage.
4. Security – Security is an important aspect of data transfer. Security measures such as encryption and authentication are used to protect data from unauthorized access and potential misuse.
5. Encryption – Encryption is a process that scrambles data so that it can only be read by a person or system with the correct encryption key. Encryption is used to protect data from unauthorized access or modification.
6. Authentication – Authentication is the process of verifying a user’s identity in order to grant them access to a system or resource. Authentication typically involves providing a username and password, or using biometric data such as a fingerprint.
Teradata – BTEQ
BTEQ is a Teradata command-line utility used to access and manipulate data on a Teradata Database. It is a Teradata native query and data manipulation tool that allows users to submit queries, receive results, and format reports. BTEQ can be used to write scripts for automating various database management tasks. It is also used for transferring data between Teradata and other systems, such as text files, spreadsheet files, and other RDBMSs. BTEQ can be used to create, modify, and delete objects in the Teradata Database. It can also be used to create macros, which are stored procedures that can be used to perform complex operations. BTEQ scripts can be used to schedule tasks and generate reports.
BTEQ Terms
1. BTEQ – BTEQ stands for Basic Teradata Query language and is a powerful, command-based query language used to access and manipulate data stored in Teradata databases.
2. SQL – SQL stands for Structured Query Language and is a programming language used for managing data stored in relational databases.
3. DDL – DDL stands for Data Definition Language and is a set of commands used to create, modify, and delete database objects such as tables, indexes, views, and stored procedures.
4. DML – DML stands for Data Manipulation Language and is a set of commands used to insert, update, and delete data from a database.
5. DCL – DCL stands for Data Control Language and is a set of commands used to control access to data stored in a database.
6. TDCH – TDCH stands for Teradata Connector for Hadoop and is a feature of Teradata Database that enables users to easily move data from Hadoop to Teradata and vice versa.