R – Coding Language

What is R?

R is a language and environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

Below is a screenshot from my R Language course completion on codeschool.com.

R Finished

In order to show some examples and capabilities within R, I am going to explore the first ten highest ranked female tennis players. Although, to input data in R, you can either load a file or manually input dat, for this exercise, the data sets will be manually inputted and will be manipulated to show more examples.

In order to input the data I needed to use the following commands:

ranks <- c(1, 2, 3, 4, 5, 6, 7, 8, 19, 10)

names <- c(“Serena Williams”, “Martina Hingis”, “Monica Seles”, “Venus Williams”, “Margaret Court”, “Maria Sharapova”, “Chris Evert”, “Billie Jean King”, “Martina Navratilova”, “Steffi Graf”)

Nationality <-c (“American”, “English”, “German”, “American”, “Australian”, “Russian”, “Austrian”,”American”, “Russian”, “Polish”)

GrandSlamSingles <-c (39, 5, 9, 7, 24, 5, 18, 12, 18, 22)

Next, I just combined the vectors that I made with the data.frame() function

players <- data.frame(names, ranks, Nationality, GrandSlamSingles)

In order to view the number of GrandSlamSingles against the player’s ranks, the code was run:

plot(GrandSlamSingles, ranks)

R 1

Now lets input more commands to visualise data as a barplot and customise graph for better visualisation and data interpretation.

But first I ran the barplot command first

Barplot(GrandSlamSingles)

R2

Then the abline(h = median(goals)) command which enables me to insert a medial line on my graph.

R3

This command allows me to create a barpolt and change the bar colours to any colour I choose, in this case, I chose purple.

barplot(goals, col=’purple’)

R4

I can label my X and Y axis by running this command

title(xlab=“names”, col.lab=rgb(0,0.5,0)) title(ylab=“GrandSlamSingles”, col.lab=rgb(0,0.5,0))

R5

With the code below, I can connect the tips of my graph and have it in colour red with the caption “Highest Singles Grand Slam Winner of All Time”

lines(goals, type=”o”, pch=22, lty=2, col=”red”)> title(main=”Highest Singles Grand Slam Winner of All Time”, col.main=”red”, font.main=5)  

Then I used these commands to plot my data trend.

library(ggplot2)

qplot(names, ranks, color=GrandSlamSingles)

R6

With a little tweaking of the code, more details can be added to the outlook of the graph.

 qplot(GrandSlamSingles, ranks, col=names, main =”All Time 10 Female Tennis Players”)

R7

What other ideas/concepts could be represented via R Graphics?

Individual players can be monitored and their abilities measured with their performance.

We can also check their losses, wins or draws more closely for the coming year.

Same sets of information can be carried out for men’s’ Tennis Players as well.


 

 

 

Sources

http://www.wtatennis.com/singles-rankings

https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf

https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-rhmbfv

 

 

BIG DATA

 Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy.

The term often refers simply to the use of predictive analytic or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.

Continue reading BIG DATA