Solved by verified expert:The ETL Process is the heart of the technical side of data
warehousing. Conduct some independent research on the ETL Process.Write a 1-2 page APA formatted
paper with citations and references that analyzes why the ETL process is
important for data warehousing efforts. Within your paper, discuss the three steps of the ETL process and briefly describe the four categories of ETL technologies. Please
provide examples of ETL technologies.I am including the chapter from our book on ETL and included the reference in case you want to use it but for the examples of ETL technologies please find that online and include those references as well. Be sure to include why ETL is important for data warehousing efforts, thanks!
Unformatted Attachment Preview
Extraction, Transformation, and Load
At the heart of the technical side of the data warehousing process is extraction,
transformation, and load (ETL). ETL technologies, which have existed for some time,
are instrumental in the process and use of data warehouses. The ETL process is an integral
component in any data-centric project. IT managers are often faced with challenges because
the ETL process typically consumes 70 percent of the time in a data-centric project.
The ETL process consists of extraction (i.e., reading data from one or more databases),
transformation (i.e., converting the extracted data from its previous form into the form in
which it needs to be so that it can be placed into a data warehouse or simply another
database), and load (i.e., putting the data into the data warehouse). Transformation occurs
by using rules or lookup tables or by combining the data with other data. The three database
functions are integrated into one tool to pull data out of one or more databases and place
them into another, consolidated database or a data warehouse.
ETL tools also transport data between sources and targets, document how data elements
(e.g., metadata) change as they move between source and target, exchange metadata with
other applications as needed, and administer all runtime processes and operations (e.g.,
scheduling, error management, audit logs, statistics). ETL is extremely important for data
integration as well as for data warehousing. The purpose of the ETL process is to load the
warehouse with integrated and cleansed data. The data used in ETL processes can come
from any source: a mainframe application, an ERP application, a CRM tool, a flat file, an
Excel spreadsheet, or even a message queue. In Figure 2.9, we outline the ETL process.
The process of migrating data to a data warehouse involves the extraction of data from all
relevant sources. Data sources may consist of files extracted from OLTP databases,
spreadsheets, personal databases (e.g., Microsoft Access), or external files. Typically, all the
input files are written to a set of staging tables, which are designed to facilitate the load
process. A data warehouse contains numerous business rules that define such things as how
the data will be used, summarization rules, standardization of encoded attributes, and
calculation rules. Any data quality issues pertaining to the source files need to be corrected
before the data are loaded into the data warehouse. One of the benefits of a well-designed
data warehouse is that these rules can be stored in a metadata repository and applied to the
data warehouse centrally. This differs from an OLTP approach, which typically has data and
business rules scattered throughout the system. The process of loading data into a data
warehouse can be performed either through data transformation tools that provide a GUI to
aid in the development and maintenance of business rules or through more traditional
methods, such as developing programs or utilities to load the data warehouse, using
programming languages such as PL/SQL, C++, Java, or .NET Framework languages. This
decision is not easy for organizations. Several issues affect whether an organization will
purchase data transformation tools or build the transformation process itself:
FIGURE 2.9 The ETL Process.
• Data transformation tools are expensive.
• Data transformation tools may have a long learning curve.
• It is difficult to measure how the IT organization is doing until it has learned to use the
data transformation tools.
In the long run, a transformation-tool approach should simplify the maintenance of an
organization’s data warehouse. Transformation tools can also be effective in detecting and
scrubbing (i.e., removing any anomalies in the data). OLAP and data mining tools rely on
how well the data are transformed.
As an example of effective ETL, Motorola, Inc., uses ETL to feed its data warehouses.
Motorola collects information from 30 different procurement systems and sends them to its
global SCM data warehouse for analysis of aggregate company spending (see
Solomon (2005) classified ETL technologies into four categories: sophisticated, enabler,
simple, and rudimentary. It is generally acknowledged that tools in the sophisticated
category will result in the ETL process being better documented and more accurately
managed as the data warehouse project evolves.
Even though it is possible for programmers to develop software for ETL, it is simpler to use
an existing ETL tool. The following are some of the important criteria in selecting an ETL
tool (see Brown, 2004):
• Ability to read from and write to an unlimited number of data source architectures
• Automatic capturing and delivery of metadata
• A history of conforming to open standards
• An easy-to-use interface for the developer and the functional user
Performing extensive ETL may be a sign of poorly managed data and a fundamental lack of
a coherent data management strategy. Karacsony (2006) indicated that there is a direct
correlation between the extent of redundant data and the number of ETL processes. When
data are managed correctly as an enterprise asset, ETL efforts are significantly reduced, and
redundant data are completely eliminated. This leads to huge savings in maintenance and
greater efficiency in new development while also improving data quality. Poorly designed
ETL processes are costly to maintain, change, and update. Consequently, it is crucial to
make the proper choices in terms of the technology and tools to use for developing and
maintaining the ETL process.
A number of packaged ETL tools are available. Database vendors currently offer ETL
capabilities that both enhance and compete with independent ETL tools. SAS acknowledges
the importance of data quality and offers the industry’s first fully integrated solution that
merges ETL and data quality to transform data into strategic valuable assets. Other ETL
software providers include Microsoft, Oracle, IBM, Informatica, Embarcadero, and Tibco.
For additional information on ETL, see Golfarelli and Rizzi (2009), Karaksony (2006), and
Sharda, R., Delen, D., Turban, E. (2013-12-01). Business Intelligence: A Managerial Perspective on
Analytics, 3rd Edition. [Bookshelf Ambassadored]. Retrieved
Purchase answer to see full
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more