Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization
Sangho Suh1, Jaegul Choo1,
Joonseok Lee2, Chandan K. Reddy3
1Korea University, 2Google Research, 3Virginia Tech
*Invited to Sister Conference Best Paper Track as the ICDM'16 Best Student Paper
August 19-25, 2017 @ Melbourne, Australia
Sampled topics from papers containing keywords, 'dimension' or 'reduction'
Proposed Idea
Local topic discovery to extract
more specific, informative topics
For local topic discovery,
1) Iterative Topic Modeling on Residual Matrix
-> Ensemble
2) Boost & Suppress -> Local weighting scheme
Localized Ensemble of
Nonnegative Matrix Factorization (L-EnsNMF)
1) NMF Topic Modeling
-> Find a set of topics
2) Residual Update
-> Identify unexplained parts (e.g. egyptian cat)
3) Anchor Sampling & Local Weighting
-> Reveal unexplained parts and suppress explained parts
We generated 100 topics (10 keywords each) using each different method, but only L-EnsNMF extracted local, specific keywords,
e.g., ‘hurrican’, ‘sandi’, ‘ireland.’
Dataset: Twitter (New York City in June 2013)
Ireland football team visited New York City in June 2013
to boost a community hit by Hurricane Sandy in 2012
Select user-specified document(col) and keyword(row)
Reveal user-specified documents/topics
Thank you
Questions?
E-mail: jchoo@korea.ac.kr
Code: https://github.com/sanghosuh/lens_nmf-matlab