Business intelligence (BI)

BI

Business intelligence (BI)

Business intelligence (BI) is a technology-driven process for analysing data  and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions. BI encompasses a variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, develop and run queries against the data, and create reports, dashboards and data visualizations to make the analytical results available to corporate decision makers as well as operational workers.

The potential benefits of business intelligence programs include:

  • Accelerating and improving decision making
  • Optimizing internal business processes
  • Increasing operational efficiency
  • Driving new revenues and
  • Gaining competitive advantages over business rivals. 

    BI systems can also help companies identify market trends and spot business problems that need to be addressed.

    Business intelligence allows a company to make effective important strategic decisions. By using the business intelligence model a company strategically plan how to make the overall operations of the company better for them and easier for their customer to use their services while keeping ahead of their competitors.

    By using the data and information collected on their customers they can analyse what aspects of the company the customer uses the most and where they can improve in other parts of the company.

    Business intelligence combines a broad set of data analysis applications, including:

    1. Ad Hoc Analysis and querying:

    Ad hoc analytics is the discipline of analyzing data on an as-needed or requested basis. Historically challenging, ad hoc analytics on big data sets versus relational databases adds a new layer of complexity due to increased data volumes, faster data velocity, greater data variety and more sophisticated data models.

    BI2

2. Online analytical processing (OLAP)

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company’s beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate “dimension.” OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub attributes.

OLAP can be used for data mining or the discovery of previously undiscerned relationships between data items. An OLAP database does not need to be as large as a data warehouse since not all transactional data is needed for trend analysis. Using Open Database Connectivity (ODBC), data can be imported from existing relational databases to create a multidimensional database for OLAP. Two leading OLAP products are Hyperion Solution’s Essbase and Oracle’s Express Server. OLAP products are typically designed for multiple-user environments, with the cost of the software based on the number of users.

BI technology also includes data visualization software for designing charts and other infographics as well as tools for building BI dashboards and performance scoreboards that display visualized data on business metrics and key performance indicators in an easy-to-grasp way.

BI applications can be bought separately from different vendors or as part of a unified BI platform from a single vendor.

3. HADOOP

Hadoop is an open source software framework for storing data and the ability to run applications of commodity hardware.

Hadoop was initially started in 2003 but moved to the new subproject and what we know as Hadoop today in 2006 by Doug Cutting, Hadoop was named after Cutting’s son’s toy elephant.

Hadoop has the power to be able to handle endless amount of data and is able to handle endless amounts of jobs and tasks.

The reliability and the fact it saves companies from building their own data centers and piling money into building for a fraction of the price they can hire Hadoop to store their data for them.

Many multinational companies makes use of HADOOP. These are a few of them.

  • Google
  • IBM
  • LinkedIn
  • Facebook
  • AO
  • Adobe
  • Fox
  • Spotify
  • Twitter

Four trends influencing the face of BI:

  1. Unstructured data

Unstructured data is a vast, vast unrealized and untapped natural resource. When I say unrealized, I mean everyone recognizes it’s out there, and it’s a rich vein to be mined, but many executives maybe sitting right on top of the gold without even realizing it’s down there.

The ability to extract insights from unstructured data — which is the essence of Big Data — represents opportunities for real business returns. The insights that lie in Big Data are key to competitiveness in today’s economy — offering insights to predict market shifts, understand customer behavior, optimize supply chains, and develop product innovations.

Executives, managers and professionals who are able to make better and faster decisions more often will have the edge in today’s economy.

In a survey Unisphere Research conducted among 264 data managers about a year ago, it was found almost unanimously that unstructured data — which is defined as business documents, presentations and social media data is on the rise, and ready to engulf their current data management systems.

The trouble is, management does not understand that the challenge is coming, and fails to recognize the significance of unstructured data assets to the business. So there’s lots of work to be done here.

2. Cloud-based BI and analytics

BI can be expensive to purchase, implement and maintain. Cloud may change all that. Cloud is opening up business intelligence and analytics to more users — non-analysts — within organizations. There already is a drive to make BI more ubiquitous, and the cloud will accelerate this move toward simplified access.

To be sure, we’re still only in the early stages cloud-based BI and analytics. A survey of 200 companies by Saugatuck Technology concludes that only about 13 percent of enterprises worldwide — including all industries and all sizes of enterprises — have cloud-based BI/advanced analytics solutions in place and in use. But this is about to change.

3. Mobile BI and analytics

More enterprises are embracing access to data analytics via mobile apps. Having analytics available in a simple app fashion could be a major boost for efforts to “democratize” analytics in organizations.

The key is to keep things simple and understandable, and mobile apps can go a long way in delivering this. Analytics can be offered through simple, single-purpose mobile apps, whose utility is quickly and easily grasped by business users.

4. Visual analytics

Some also refer to this as “3D data visualization.” Perhaps even 4D would be a better way to describe it, since it enables a look across time — the fourth dimension.

Visual analytics provides something more powerful than 2D charts, and providing deeper understanding. They are typically interactive, 3D diagrams that enable decision-makers to see at a glance what is trending.

A stunning example of visual analytics is Google’s work-in-progress, a 3D map of the universe called, “100,000 Stars.” It enables you to zoom in on our solar system, and then zoom over to the closest adjoining star and its solar system. Click on specific stars and planets, and you will get a brief description.

 

 

 

 

 

 

 

 

 

Sources

http://www.zdnet.com/article/4-forces-changing-the-face-of-business-intelligence/

http://www.matillion.com/insight/6-real-life-examples-of-successful-business-intelligence-systems/

 

GOOGLE FUSION TABLE

 

This blog post is about fusion tables and the 2011 Irish census and how the data can be visualized using Google Fusion Table and Microsoft.

A link to the KML file with the county coordinates was given by our lecturer and the population from 2011 census was gotten from the Central Statistics Office (CSO.ie).

Even though, new research didn’t have to be conducted, and before I could proceed with merging the file, the data had to be cleaned

I updated the county column on both files to ensure they had the same naming convention, once completed I used this column to merge the data together.

 

Step 1 – Log in to Google Drive

Step 2 – Select Google Fusion – Add New File

FT1

This was the first picture of the geometry after merging both files.

It was re-focused and then I got this.

FT2

To change and differentiate the counties, the range and colours are changed by going into Tools – Fill Colour, I then put in the range I wanted and colours.

FT3 

Here is the result.

FT4

Represented in the column chart is the sum of persons in each county.

FT5

 

FT6

The map aims to reflect the population data from the 26 counties in Republic of Ireland.

Some of the benefits of data visualization to a business/decision maker is:

  • To collate & understand the data more easily.
  • Determine & spot patterns & trends in business operations.

From the heatmap above, it’s obvious that Dublin & Cork are growing faster ahead of the other counties. A business or company can use these trends to their advantage by making important decisions to grow the business.

Both Cork and Dublin seems the most densely populated counties, and this visual representation can prove to be very useful for fast moving businesses who needs to quickly identify trends and risks involved as quickly as possible. Either to establish new businesses, move existing business or close the business.

DATA WAREHOUSING

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.

A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making.

According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.

A data warehouses provides us generalized and consolidated data in multidimensional view. Along with generalized and consolidated view of data, a data warehouses also provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining.

Data mining functions such as association, clustering, classification, prediction can be integrated with OLAP operations to enhance the interactive mining of knowledge at multiple level of abstraction.

A data warehouse helps business executives to organize, analyse, and use their data for decision making. Data warehouses are widely used in the following fields:

  • Financial services
  • Banking services
  • Consumer goods
  • Retail sectors
  • Controlled manufacturing

An operational database undergoes frequent changes on a daily basis on account of the transactions that take place.

Understanding a Data Warehouse

  • A data warehouse helps executives to organize, understand, and use their data to take strategic decisions.
  • Data warehouse systems help in the integration of diversity of application systems.
  • It possesses consolidated historical data, which helps the organization to analyze its business.
  • There is no frequent updating done in a data warehouse.
  • A data warehouse is a database, which is kept separate from the organization’s operational database.

COMPONENTS OF A DATA WAREHOUSE

DW1

FEATURES OF DATA WAREHOUSE

  • Subject Oriented – A data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.

 

  • Time Variant – The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.

 

  • Integrated – A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.

Non-volatile – Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.

 

DIFFERENCES BETWEEN A DATA WAREHOUSE IS SEPARATED FROM OPERATIONAL DATABASES

DATA WAREHOSE  [OLAP] OPERATIONAL DATABASES [ OLTP]
An OLAP query needs only read only access of stored data An operational database query allows to read and modify operations
Data ;warehouse queries are often complex and they present a general form of data An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing
Concurrency control and recovery mechanisms are NOT required for Data warehousing. Operational databases support concurrent processing of multiple transactions.
A data warehouse maintains historical data. An operational database maintains current data
OLAP systems are used by knowledge workers such as executives, managers, and analysts. OLTP systems are used by DBAs or database professionals.
It is used to analyse the business. It is used to run a business.
It is based on Star Schema, Snowflake Schema, and Fact Constellation Schema. It is based on Entity Relationship Model.
It focuses on Information out. It is application oriented.
The number of users is in hundreds. The number of users is in thousands
These are highly flexible. It provides high performance.

 

TYPES OF DATA WAREHOUSE 

Information processing, analytical processing, and data mining are the three types of data warehouse applications that are discussed below:

  • Information Processing – A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.

 

  • Analytical Processing – A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.

 

  • Data Mining – Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.

There are decision support technologies that help utilize the data available in a data warehouse. These help to use the warehouse quickly and effectively. They can gather data, analyse it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains:

  • Tuning Production Strategies – The product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly.

 

  • Customer Analysis – Customer analysis is done by analyzing the customer’s buying preferences, buying time, budget cycles, etc.

 

  • Operations Analysis – Data warehousing also helps in customer relationship management, and making environmental corrections. The information also allows us to analyze business operations.

 

3 TIER DATA WAREHOUSE ARCHITECTURE

DW2

Generally, a data warehouse adopts a three-tier architecture.

  • Top-Tier – This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.

 

  • Middle Tier – This consists of the OLAP Server that can be implemented in either of the following ways.

By Relational OLAP (ROLAP): An extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations.

By Multidimensional OLAP (MOLAP) model: Directly implements the multidimensional data and operations

  • Bottom Tier – The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.

 

 

 

 

 

 

 

 

 

 

 

Sources

file:///C:/Users/Damiodus/Downloads/04%20Data%20Warehouses.pdf

http://www.slideshare.net/aswathysnair776/data-mining-and-data-warehousing-32857966

http://www.tutorialspoint.com/dwh/dwh_system_processes.htm

R – Coding Language

What is R?

R is a language and environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

Below is a screenshot from my R Language course completion on codeschool.com.

R Finished

In order to show some examples and capabilities within R, I am going to explore the first ten highest ranked female tennis players. Although, to input data in R, you can either load a file or manually input dat, for this exercise, the data sets will be manually inputted and will be manipulated to show more examples.

In order to input the data I needed to use the following commands:

ranks <- c(1, 2, 3, 4, 5, 6, 7, 8, 19, 10)

names <- c(“Serena Williams”, “Martina Hingis”, “Monica Seles”, “Venus Williams”, “Margaret Court”, “Maria Sharapova”, “Chris Evert”, “Billie Jean King”, “Martina Navratilova”, “Steffi Graf”)

Nationality <-c (“American”, “English”, “German”, “American”, “Australian”, “Russian”, “Austrian”,”American”, “Russian”, “Polish”)

GrandSlamSingles <-c (39, 5, 9, 7, 24, 5, 18, 12, 18, 22)

Next, I just combined the vectors that I made with the data.frame() function

players <- data.frame(names, ranks, Nationality, GrandSlamSingles)

In order to view the number of GrandSlamSingles against the player’s ranks, the code was run:

plot(GrandSlamSingles, ranks)

R 1

Now lets input more commands to visualise data as a barplot and customise graph for better visualisation and data interpretation.

But first I ran the barplot command first

Barplot(GrandSlamSingles)

R2

Then the abline(h = median(goals)) command which enables me to insert a medial line on my graph.

R3

This command allows me to create a barpolt and change the bar colours to any colour I choose, in this case, I chose purple.

barplot(goals, col=’purple’)

R4

I can label my X and Y axis by running this command

title(xlab=“names”, col.lab=rgb(0,0.5,0)) title(ylab=“GrandSlamSingles”, col.lab=rgb(0,0.5,0))

R5

With the code below, I can connect the tips of my graph and have it in colour red with the caption “Highest Singles Grand Slam Winner of All Time”

lines(goals, type=”o”, pch=22, lty=2, col=”red”)> title(main=”Highest Singles Grand Slam Winner of All Time”, col.main=”red”, font.main=5)  

Then I used these commands to plot my data trend.

library(ggplot2)

qplot(names, ranks, color=GrandSlamSingles)

R6

With a little tweaking of the code, more details can be added to the outlook of the graph.

 qplot(GrandSlamSingles, ranks, col=names, main =”All Time 10 Female Tennis Players”)

R7

What other ideas/concepts could be represented via R Graphics?

Individual players can be monitored and their abilities measured with their performance.

We can also check their losses, wins or draws more closely for the coming year.

Same sets of information can be carried out for men’s’ Tennis Players as well.


 

 

 

Sources

http://www.wtatennis.com/singles-rankings

https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf

https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-rhmbfv

 

 

BIG DATA

 Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy.

The term often refers simply to the use of predictive analytic or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.

Continue reading BIG DATA