Valentine’s Day Word Cloud using Oracle’s Distribution of R

Brandon JonesBig Data & BI, Oracle, Technical TipsLeave a Comment

Red Rose

We hope you had a great Valentine’s Weekend! Whether you did or not, we knew people on twitter were talking about the big day so we thought we’d do some analysis.

Check out how we made this word cloud using Oracle’s Distribution of R to pull Valentine’s Day tweets from Twitter.

Once you have Oracle’s Distribution of R running, you must first install the necessary packages.

> install.packages(c(“devtools”, “rjson”, “bit64”, “httr”))

Oracle R screenshot

> install_github("twitteR", username="geoffjentry")

Then, you must load the correct libraries.

> library(devtools)
> library(tm)
> library(wordcloud)
> library(twitteR)

If you do not have a twitter app already, you will need to make one at apps.twitter.com. Upon creating a new app, you will receive your personal API key, API secret, access token, and access token secret.

> api_key <- "YOUR API_KEY"
> api_secret <- "YOUR API_SECRET"
> access_token <- "YOUR ACCESS_TOKEN"
> access_token_secret <- "YOUR ACCESS_TOKEN_SECRET"
> setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

Oracle R APIs screenshot

Now, for the fun part. The following command will search Twitter for the keyword of your choice (we chose “ValentinesDay”), where n is equal to the number of tweets you want to pull.

> vday <- searchTwitter("ValentinesDay", n=1500)

Next, we need to get the text and create a corpus.

> > vday_text <- sapply(vday, function(x) x$getText())
> vday_text_corpus <- Corpus(VectorSource(vday_text))
> vday_text_corpus <- tm_map(vday_text_corpus, content_transformer(function(x) iconv(x, to='UTF-8', sub='byte')), mc.cores=1)

Almost there! Let’s clean up those tweets.

> vday_text_corpus <- tm_map(vday_text_corpus, removeNumbers)
> vday_text_corpus <- tm_map(vday_text_corpus, removePunctuation)
> vday_text_corpus <- tm_map(vday_text_corpus, stripWhitespace)
> vday_text_corpus <- tm_map(vday_text_corpus, content_transformer(tolower))
> vday_text_corpus <- tm_map(vday_text_corpus, removeWords, stopwords('english'))

Oracle R, cleaning up the tweets

Finally, you’ll need to create a color palette and use it to make your word cloud.

In the following command, 4 is the number colors and “PuRd” is a color palette consisting of purples and reds.

> mypalette <- brewer.pal(4,"PuRd")
> wordcloud(vday_text_corpus, min.freq=20, max.words=100, random.order=T, random.color=T, colors=mypalette)

The result:

Oracle R Valentine's Day word cloud

Some words that came up frequently included:

chocolate, cupcakes, romantic, husband, heart, love, happy, romance, ring, wife, candy, beautiful, and perfect.

Leave a Reply

Your email address will not be published. Required fields are marked *