cv

Basics

Name Nick Oh
Label Researcher
Email nick.sh.oh@gmail.com
Url https://github.com/socius-org
Summary summary...

Work

  • 2023.01 - Present
    Principal Researcher
    socius
    socius is an AI research lab endorsed by the London School of Economics creating open-source models and data to empower social scientists. * Lead developer of sentibank, a large-scale open database consolidating 15 original and 43 processed sentiment dictionaries spanning diverse genres and domains. This resource empowered social scientists with tailored lexicons to conduct more insightful sentiment analysis. * Developed RedditHarbor, a toolkit enabling easy large-scale collection of Reddit data. Storing the data in Supabase made it conveniently accessible to researchers. This resource expanded social scientists' data sources for quantitative and qualitative analysis. * Architected Sentium, an interpretable rule-based sentiment analysis model with state-of-the-art performance on social science text. The model's explainability helps researchers derive impactful insights from sentiment analysis.
    • NLP
  • 2021.09 - 2023.01
    Junior NLP Engineer
    Numen Capital
    Worked on the in-house Knowsis research team, leveraging NLP and machine learning for alternative data insights. * Developed NLP pipelines for alternative data insights, including ESG tweet classifier based on transformer architecture (97% accuracy), rule-based sentiment analyser, and domain-specific dictionary. *Analysed alternative data signals derived from NLP pipelines using time series models. This analysis led to accurate forecasts of market movements, directly assisting quantitative investment strategies.
    • NLP

Education

  • 2022.04 - 2023.12

    London, UK

    MSc
    University of London
    Data Science and Artificial Intelligence
    • Big data analysis
    • Data programming in Python
    • Statistics and statistical mining
    • Machine learning
    • Data science research topics
    • Mathematics of financial markets
    • Artificial intelligence
    • Natural language processing
    • Neural networks
    • Social media and network science
  • 2016.09 - 2020.07

    London, UK

    BSc
    London School of Economics and Political Science
    Politics and Economics
    • Macroeconomic Principles
    • Microeconomic Principles I
    • Public Choice and Politics
    • Research Design in Political Science
    • Monetary Economics
    • Applied Quantitative Methods for Political Science
    • Algorithms and Programming
    • Game Theory I
    • Politics of Money and Finance in Comparative Perspective

Certificates

Big Data to Decisions: AI and Machine Learning
London Business School 2019-03-11

Publications

  • 2024
    sentibank: A Unified Resource of Sentiment Lexicons and Dictionaries
    International AAAI Conference on Web and Social Media
    Sentiment analysis is critical across computational social science domains, but faces challenges in interpretability. Rule-based methods relying on expert lexicons enable transparency, yet applying them is hindered by resource fragmentation and lack of validation. This paper introduces sentibank, a large-scale unified database consolidating 15 original sentiment dictionaries and 43 preprocessed dictionaries, spanning 7 genres and 6 domains.

Languages

Korean
Fluent
English
Fluent

Interests

Applied AI/ML
Natural Language Processing
Domain-specific Sentiment Analysis
Interpretable AI/ML System
Social Data Science
Explainable AI in Social Contexts
Prediction and forecasting models
Social Media Analytics
Political Discourse Analysis
Text Mining

Projects

  • 2023.12 - Present
    RedditHarbor
    Open-source Python toolkit designed to simplify the process of collecting and archiving Reddit data for research purposes
    • Reddit Data Collection
    • Research Data Toolkit
    • Research Data Pipeline
    • Reddit Data API
    • Data Crawler