#StackBounty: #tm #topic-modeling #n-gram #udpipe #textrank R extract most common word(s) in a column by group

Bounty: 50

I wish to extract main keywords from the column ‘title’, for each group (1st column).

data

Desired result in column ‘desired title’:

desired

Reproducible data:

myData <- 
structure(list(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3), title = c("mentoring aug 8th 2018", 
"mentoring aug 9th 2017", "mentoring aug 9th 2018", "mentoring august 31", 
"mentoring blue care", "mentoring cara casual", "mentoring CDP", 
"mentoring cell douglas", "mentoring centurion", "mentoring CESO", 
"mentoring charlotte", "medication safety focus", "medication safety focus month", 
"medication safety for nurses 2017", "medication safety formulations errors", 
"medication safety foundations care", "medication safety general", 
"communication surgical safety", "communication tips", "communication tips for nurses", 
"communication under fire", "communication webinar", "communication welling", 
"communication wellness")), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

I’ve looked into record linkage solutions, but that’s mainly for grouping the full titles.
Any suggestions would be great.

EDIT: I concatenated all titles by group, and tokenized them:

library(dplyr)
myData <-
  topic_modelling %>% 
  group_by(group) %>% 
  mutate(titles = paste0(title, collapse = " ")) %>%
  select(group, titles) %>% 
  distinct()

myTokens <- myData %>% 
  unnest_tokens(word, titles) 

myTokens



Below is the resulting dataframe:

tokens

I’ve been trying the udpipe and textclean package

a <- keywords_collocation(x = myTokens, term = "word", group = "group", ngram_max = 3, n_min = 1, sep = " ")
a

But I’m not getting a clear answer for each of the 3 groups.

Any suggestions would be great!


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.