cv
Basics
Name | Nick Oh |
Label | Researcher |
nick.sh.oh@gmail.com | |
Url | https://github.com/socius-org |
Summary | summary... |
Work
- 2023.01 - Present
Principal Researcher
socius
socius is an AI research lab endorsed by the London School of Economics creating open-source models and data to empower social scientists. * Lead developer of sentibank, a large-scale open database consolidating 15 original and 43 processed sentiment dictionaries spanning diverse genres and domains. This resource empowered social scientists with tailored lexicons to conduct more insightful sentiment analysis. * Developed RedditHarbor, a toolkit enabling easy large-scale collection of Reddit data. Storing the data in Supabase made it conveniently accessible to researchers. This resource expanded social scientists' data sources for quantitative and qualitative analysis. * Architected Sentium, an interpretable rule-based sentiment analysis model with state-of-the-art performance on social science text. The model's explainability helps researchers derive impactful insights from sentiment analysis.
- NLP
- 2021.09 - 2023.01
Junior NLP Engineer
Numen Capital
Worked on the in-house Knowsis research team, leveraging NLP and machine learning for alternative data insights. * Developed NLP pipelines for alternative data insights, including ESG tweet classifier based on transformer architecture (97% accuracy), rule-based sentiment analyser, and domain-specific dictionary. *Analysed alternative data signals derived from NLP pipelines using time series models. This analysis led to accurate forecasts of market movements, directly assisting quantitative investment strategies.
- NLP
Education
-
2022.04 - 2023.12 London, UK
MSc
University of London
Data Science and Artificial Intelligence
- Big data analysis
- Data programming in Python
- Statistics and statistical mining
- Machine learning
- Data science research topics
- Mathematics of financial markets
- Artificial intelligence
- Natural language processing
- Neural networks
- Social media and network science
-
2016.09 - 2020.07 London, UK
BSc
London School of Economics and Political Science
Politics and Economics
- Macroeconomic Principles
- Microeconomic Principles I
- Public Choice and Politics
- Research Design in Political Science
- Monetary Economics
- Applied Quantitative Methods for Political Science
- Algorithms and Programming
- Game Theory I
- Politics of Money and Finance in Comparative Perspective
Certificates
Big Data to Decisions: AI and Machine Learning | ||
London Business School | 2019-03-11 |
Publications
-
2024 sentibank: A Unified Resource of Sentiment Lexicons and Dictionaries
International AAAI Conference on Web and Social Media
Sentiment analysis is critical across computational social science domains, but faces challenges in interpretability. Rule-based methods relying on expert lexicons enable transparency, yet applying them is hindered by resource fragmentation and lack of validation. This paper introduces sentibank, a large-scale unified database consolidating 15 original sentiment dictionaries and 43 preprocessed dictionaries, spanning 7 genres and 6 domains.
Languages
Korean | |
Fluent |
English | |
Fluent |
Interests
Applied AI/ML | |
Natural Language Processing | |
Domain-specific Sentiment Analysis | |
Interpretable AI/ML System |
Social Data Science | |
Explainable AI in Social Contexts | |
Prediction and forecasting models | |
Social Media Analytics | |
Political Discourse Analysis | |
Text Mining |
Projects
- 2023.12 - Present
RedditHarbor
Open-source Python toolkit designed to simplify the process of collecting and archiving Reddit data for research purposes
- Reddit Data Collection
- Research Data Toolkit
- Research Data Pipeline
- Reddit Data API
- Data Crawler