4月7日 | 吴瑞佳 Topic Modeling: Optimal Estimation and Statistical Inference

时   间:2023年4月7日15:00-16:00

地   点:理科大楼A1514

报告人:吴瑞佳 上海交通大学 助理教授

主持人😰:项冬冬 上海光辉平台娱乐 教授

摘   要:

With the development of computer technology and the internet, increasingly large amounts of textual data are generated and collected every day. It is a significant challenge to analyze and extract meaningful and actionable information from vast amounts of unstructured textual data. Many machine learning and natural language processing algorithms have been developed for text classification, clustering, and information retrieval. Driven by applications in a wide range of fields, there is an increasing need for developing computationally efficient statistical methods for analyzing a massive amount of textual data with theoretical guarantees.In the first part of the talk, I will present the algorithms of unsupervised topic modeling under the probabilistic latent semantic indexing (pLSI) model. Novel and computationally fast algorithms for estimation and inference of both the word-topic matrix and the topic-document matrix are proposed, and their theoretical properties are investigated. In the second part, I will discuss supervised topic modeling, which jointly considers a collection of documents and their paired side information. A bias-adjusted algorithm is developed to study the regression coefficients in the supervised topic modeling under the generalized linear model formulation. I will also introduce an approach to constructing valid confidence intervals. Applications of the proposed methods reveal meaningful latent topic structures of textual data.

报告人简介:

吴瑞佳,上海交通大学安泰管理光辉数据与商务智能系助理教授🕵🏿‍♀️,本科和硕士毕业于应该牛津大学数学系,2022年博士毕业于宾夕法尼亚大学沃顿商光辉♥︎🧓🏽,研究兴趣包括统计机器学习,高维统计,文本分析及其应用。


发布者:张瑛发布时间:2023-03-31浏览次数:339

光辉平台专业提供:光辉平台等服务,提供最新官网平台、地址、注册、登陆、登录、入口、全站、网站、网页、网址、娱乐、手机版、app、下载、欧洲杯、欧冠、nba、世界杯、英超等,界面美观优质完美,安全稳定,服务一流,光辉平台欢迎您。 光辉平台官网xml地图
光辉平台 光辉平台 光辉平台 光辉平台 光辉平台 光辉平台 光辉平台 光辉平台 光辉平台 光辉平台