We hope you had a great Valentine’s Weekend! Whether you did or not, we knew people on twitter were talking about the big day so we thought we’d do some analysis.
Check out how we made this word cloud using Oracle’s Distribution of R to pull Valentine’s Day tweets from Twitter.
Once you have Oracle’s Distribution of R running, you must first install the necessary packages.
> install.packages(c(“devtools”, “rjson”, “bit64”, “httr”))
> install_github("twitteR", username="geoffjentry")
Then, you must load the correct libraries.
> library(devtools) > library(tm) > library(wordcloud) > library(twitteR)
If you do not have a twitter app already, you will need to make one at apps.twitter.com. Upon creating a new app, you will receive your personal API key, API secret, access token, and access token secret.
> api_key <- "YOUR API_KEY" > api_secret <- "YOUR API_SECRET" > access_token <- "YOUR ACCESS_TOKEN" > access_token_secret <- "YOUR ACCESS_TOKEN_SECRET" > setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)
Now, for the fun part. The following command will search Twitter for the keyword of your choice (we chose “ValentinesDay”), where n is equal to the number of tweets you want to pull.
> vday <- searchTwitter("ValentinesDay", n=1500)
Next, we need to get the text and create a corpus.
> > vday_text <- sapply(vday, function(x) x$getText()) > vday_text_corpus <- Corpus(VectorSource(vday_text)) > vday_text_corpus <- tm_map(vday_text_corpus, content_transformer(function(x) iconv(x, to='UTF-8', sub='byte')), mc.cores=1)
Almost there! Let’s clean up those tweets.
> vday_text_corpus <- tm_map(vday_text_corpus, removeNumbers) > vday_text_corpus <- tm_map(vday_text_corpus, removePunctuation) > vday_text_corpus <- tm_map(vday_text_corpus, stripWhitespace) > vday_text_corpus <- tm_map(vday_text_corpus, content_transformer(tolower)) > vday_text_corpus <- tm_map(vday_text_corpus, removeWords, stopwords('english'))
Finally, you’ll need to create a color palette and use it to make your word cloud.
In the following command, 4 is the number colors and “PuRd” is a color palette consisting of purples and reds.
> mypalette <- brewer.pal(4,"PuRd") > wordcloud(vday_text_corpus, min.freq=20, max.words=100, random.order=T, random.color=T, colors=mypalette)
Some words that came up frequently included:
chocolate, cupcakes, romantic, husband, heart, love, happy, romance, ring, wife, candy, beautiful, and perfect.