Achieving fluency in Korean: a beginner’s manual for computers

We have exciting news to share this week! Luminoso’s software has officially mastered Korean, in addition to already knowing eleven other languages, including Chinese, Arabic, Russian, and of course English. In other words, it can process and analyze unstructured data in Korean without needing to translate it to and from another language like English first – a rather impressive feat, and one that’s unusual in the world of multilingual data analytics.

But how, exactly, is it possible for software to “learn” a language well enough to be able to natively analyze any type of data that’s thrown at it?

There are three steps that our linguistics team here at Luminoso must follow when expanding the software to understand new languages: 1) assessment; 2) implementation; and 3) refinement.

In the first step, we decide whether it’s actually possible for our software to understand a language well enough to run analyses at the quality and accuracy we expect. There’s much more to it than just making sure that it’s a common enough language to be listed as an option on Google Translate. (We’ll save our opinions on the quality of those translations for another day.)

For us to be able to “teach” our software a new language, three resources must be available for that language:

  1. A ConceptNet database
  2. Data on word frequency
  3. A parser

ConceptNet

The first resource that must be available is a sufficient amount of data in that language in ConceptNet. We go into the details of what ConceptNet is in these blog posts, but simply put, it’s a collection of facts about how the world works (such as “the sun is hot,” “dogs and cats can both be pets,” etc.). This knowledge base gives our software the same understanding about the world that a human has when they enter a conversation. By having this understanding, our software can more quickly understand what new words mean and can map out the relationships between different concepts in a data set with a much higher degree of accuracy and relevancy.

Data on word frequency

Having data on word frequency in a given language is the next critical resource. This is basically a summary of which words are used most often and least often. Having this information enables the software to de-prioritize very common words, which usually do not add much relevant information, and prioritize less common words. To use another English-language example, knowing word frequency helps the software understand that words like “know,” “want,” and “think” are not very meaningful on their own, and should not be presented as important in the data.

In addition, every language has common function words that do not contribute meaning but only serve to structure sentences, which we designate as “stop words.” In effect, we tell our software to ignore those words entirely when conducting an analysis. In English, our list of stop words includes conjunctions like “and” and “or,” articles like “the” and “a/an,” and pronouns like “you” and “they,” among a whole host of other very common but not-very-meaningful words. Telling our software to ignore such words helps it to more quickly hone in on what’s truly relevant in a data set. Recognizing those, however, relies not just on their frequency but also on the results of the crucial third resource: the parser.

Parser

A parser is software that marks words with their parts of speech and determines their roots. For example, an English parser would identify the words “runs,” “ran,” and “running” as different variations of the same root word, “run”, and mark all three as verbs. When new data is uploaded into Luminoso’s software, this is the first step that must happen for our products to begin understanding what is being discussed. Our first step, as mentioned above, was to assess different Korean parsers to determine which best served our needs, after which we could begin implementation.

For Korean, implementation meant taking time to explore noun and verb endings, to determine if they are inherently part of their words or removable attachments. We also needed to explore the mistakes the parser made, as with some verbs that it would classify as suffixes; and like many parsers, it had no information about online slang, like the partial syllables used in Korean to indicate laughter (ㅋㅋㅋㅋㅋ) or crying (ㅠㅠㅠ).

When it came to stop words, the common function words we want our analyses to ignore, we needed to adapt the parser to ignore pronouns and conjunctions as it does in other languages. Beyond those, however, we found several kinds of adjectives that the parser subcategorized for us, and so we took time to use whatever information it could give us…as well as adding exceptions for cases where it didn’t tell us enough.

Another challenge when preparing any language is its rules for negation, and the system in Korean provided challenges we had not seen in other languages. Korean has a “short form” negation analogous to putting “not” before a verb, and a “long form” that resembles adding “it is not the case that…” to the start of a sentence; the complications arise because the short form precedes verbs, while the long form follows it. We wanted to make sure that our system properly accounted for both kinds of negation: you certainly wouldn’t want to think that a customer said they were happy when in fact they said they weren’t happy, but with a different grammar than we expected!

Once we have our resources assessed and implemented, we can begin the process of refinement: adjusting and tweaking the way our software handles a particular language so that it returns results at the highest quality possible. This typically includes reviewing data from many different sources with a native speaker of that language, as well as streamlining our code as necessary to optimize the software’s performance.

When all is said and done, it takes 4-6 months for Luminoso to learn a new language. Unfortunately, during that same time, all I learned was three or four Korean nouns and a handful of prepositions …

Related Posts

Step into the light

KAPS GROUP

The KAPS Group is a network of consultants with a wide range of skills and experience in text analytics, taxonomy, ontology and knowledge graphs, Python and other proprietary text analytics programming languages, and information and knowledge management.

Interested in becoming a partner? Contact Us Today!

About This Partnership

The KAPS Group is a network of consultants with a wide range of skills and experience in text analytics, taxonomy, ontology and knowledge graphs, Python and other proprietary text analytics programming languages, and information and knowledge management. It was founded by Tom Reamy, author of the most comprehensive book on text analytics, Deep Text.

IBM

IBM Consulting’s watsonx practice brings expertise in the generative AI technology stack as well as domain and industry experience that can help accelerate clients’ business transformations

Interested in becoming a partner? Contact Us Today!

About This Partnership

IBM Consulting’s watsonx practice brings expertise in the generative AI technology stack as well as domain and industry experience that can help accelerate clients’ business transformations. In the same way that we established our successful Hybrid Cloud services business built on the Red Hat® OpenShift® platform, IBM Consulting intends to be the leading consulting services provider for watsonx. Businesses are demanding AI that produces accurate and trustworthy results, can scale across clouds, and can be easily adapted to enterprise domains and use cases. Watsonx is designed to help them address those needs. Let’s put AI to work and make the world work better — together.
Smart Insight Logo

Smart Insight

It features capabilities like natural language understanding AI and analytics, allowing for comprehensive data usage across organizations.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Smart Insight, operated by Uchida Yoko Co., Ltd., offers digital transformation (DX) tools like Mµgen. Mµgen integrates various data types, including IoT and big data, and supports visual data integration, AI-driven text analysis, and advanced analytics. It’s designed for quick deployment, reducing data warehouse needs and implementation costs. The tool is used by companies like Toyota, Toshiba, and Yamaha for DX initiatives. It features capabilities like natural language understanding AI and analytics, allowing for comprehensive data usage across organizations.

EDLIGO

EDLIGO offers an advanced, AI-powered comprehensive Talent Analytics solution for data-driven talent management, workforce planning, project staffing, competency management, employee experience, and retention management.

Interested in becoming a partner? Contact Us Today!

About This Partnership

EDLIGO GmbH is a leading company specializing in AI-powered Talent Analytics. EDLIGO offers an advanced, AI-powered comprehensive Talent Analytics solution for data-driven talent management, workforce planning, project staffing, competency management, employee experience, and retention management. We believe that employees are lifelong learners, so we have built a comprehensive solution that empowers organizations to master all aspects of talent management, including learning and development, with data and AI to drive the highest business impact.

EDLIGO has a strong track record, with customers successfully using our platform in more than twenty countries, boasting more than 2 million users, and filing 17 patents. In 2023, EDLIGO was recognized as one of Germany’s top three most innovative mid-sized companies in software.

Zyte

Zyte is a leader in web scraping services, offering advanced data extraction tools and proxy solutions to power business data needs efficiently and reliably.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Zyte provides a comprehensive web data platform, specializing in extracting and delivering structured web data at scale. They offer solutions like AI-powered automatic extraction, cloud hosting for crawlers, and a proxy manager for seamless data scraping.

Zyte’s services are beneficial for businesses needing large-scale, reliable web data for market research, competitive analysis, and data-driven decision-making.

Their tools cater to various data types including e-commerce products, job postings, news articles, and real estate listings, ensuring high-quality data extraction.

Salesforce

Salesforce is a leading CRM provider, offering a unified platform for sales, service, marketing, and customer engagement, integrated with AI for enhanced business growth.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Salesforce provides a comprehensive CRM platform, integrating sales, service, marketing, and customer experience tools.

Their AI-driven approach ensures efficient data handling, personalized customer interactions, and streamlined operations.

The platform benefits businesses of all sizes by enhancing customer relationships, improving sales productivity, and enabling effective marketing strategies.

Salesforce’s solutions are adaptable across various industries, helping companies achieve growth and operational excellence.

RainFocus

RainFocus offers a comprehensive platform for managing in-person, virtual, and hybrid events. They specialize in data-driven event management, providing robust registration flows, attendee engagement, and seamless omnichannel marketing.

Interested in becoming a partner? Contact Us Today!

About This Partnership

RainFocus’s platform is designed to streamline event management across various lifecycle phases. It offers a unified approach to plan, manage, deliver, and optimize events, ensuring personalized attendee experiences.

Their solutions are beneficial for businesses seeking efficient event orchestration, as they enable data integration, flexibility, and customization. This approach results in enhanced attendee engagement, operational efficiency, and strategic marketing alignment.

HiFly Labs

Hiflylabs is a data solutions company offering data engineering, science, strategy advisory, and visualization. They focus on creating enterprise solutions with an emphasis on practicality and efficiency.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Hiflylabs provides tailored data services, including data engineering, science, and visualization. They cater to various industries, offering specialized solutions like Appic for app development and Hifly SODA for sales-oriented analytics.

Their approach focuses on leveraging modern technologies and ecosystems like Databricks, dbt, and the Modern Data Stack, ensuring robust, flexible, and powerful tools for their clients. This helps clients optimize their data handling and business value creation processes.

Data Ideology

Data Ideology specializes in data strategy, engineering, AI, and analytics, offering solutions to maximize data-driven outcomes and insights.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Data Ideology provides comprehensive data services, including strategy, engineering, AI, and analytics. They help businesses identify data-driven opportunities and create strategies for optimal outcomes.

Their services include building robust data pipelines, streamlining data processing, and leveraging AI for actionable insights.

This approach ensures data quality, compliance, and maximizes the strategic value of data assets, aiding organizations in making informed, data-driven decisions.

8x8

8×8, Inc. is a provider of integrated cloud communications and customer engagement solutions, offering unified communications, contact center, video conferencing, and team chat services.

Interested in becoming a partner? Contact Us Today!

About This Partnership

8×8 delivers a unified platform for contact center, voice, video, chat, and embedded communications. Their solutions focus on enhancing customer experience, agent engagement, and employee connectivity.

Offering reliable, secure, and compliant services, 8×8 integrates with business and CRM applications like Microsoft Teams and Salesforce.

Their technology supports businesses in various industries, ensuring efficient communications and collaboration, global reach, and data-driven insights.

Vatis Tech

Vatis Tech provides an AI-powered speech-to-text infrastructure tool, offering high accuracy and efficiency in transcribing audio and video data for various industries.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Vatis Tech specializes in AI-driven speech-to-text technology, serving sectors like contact centers, broadcasting, medical, legal, media, and education.

Their platform features high accuracy, real-time transcription, and support for multiple languages and formats. It benefits users by enhancing data accessibility, improving workflow efficiency, and enabling more effective content analysis.

The technology is particularly beneficial for organizations needing rapid, precise transcription of large volumes of audio or video data.

OnlineSales

OnlineSales.ai is an advanced retail media monetization platform, offering AI-powered advertising solutions for retailers to optimize ad revenues.

Interested in becoming a partner? Contact Us Today!

About This Partnership

OnlineSales.ai specializes in retail media monetization with an AI-driven platform. It offers tools like sponsored product ads, display ads, offsite ads, and email ads to enhance digital marketing.

The platform enables retailers to increase ad revenues, deliver personalized shopping experiences, and automate ad campaign management.

Key benefits include maximizing ad spending, scaling advertising efforts, and providing an immersive shopper experience. The service is designed to be fully white-labeled and self-serve, ensuring user-friendly operation and customization according to business needs.

BabelStreet

Babel Street is a data analytics platform offering threat intelligence tools. They specialize in AI-enabled analysis of publicly and commercially available information for risk mitigation, fraud detection, and security.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Babel Street’s platform empowers organizations with AI-driven insights from vast public and commercial data sources. It offers multilingual understanding, end-to-end automation, and extensive source access.

The platform is useful for threat intelligence, risk mitigation, and fraud detection. It’s valuable to government, law enforcement, and commercial sectors for its ability to process and analyze large volumes of data, helping them stay ahead of threats and risks.

Paychex

Paychex is a leading provider of integrated human capital management solutions for payroll, benefits, human resources, and insurance services.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Paychex offers a range of services aimed at simplifying payroll and HR processes for businesses. Their solutions cover payroll, benefits, insurance, and HR administration.

By automating and streamlining these aspects, Paychex helps businesses save time and reduce errors. They cater to small and mid-sized businesses, providing tools for tax administration, employee onboarding, and regulatory compliance.

Their platform is designed to be user-friendly, ensuring a seamless experience for employers and employees alike.

Experience

Experience.com is a platform offering solutions for customer and employee experience management, as well as online reputation management, using AI-driven feedback campaigns.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Experience.com provides AI-powered tools for managing customer and employee experiences, and online reputation. Their platform aids businesses in driving intelligent customer and employee feedback campaigns, amplifying marketing efforts, and enhancing customer-focused employee behavior.

It supports industries like banking, insurance, real estate, and healthcare, helping companies build a strong brand reputation and culture, ultimately leading to better client engagement and operational efficiency.

Qlik

Qlik provides data integration, data quality, and analytics solutions, integrating AI for advanced data management and actionable insights.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Qlik offers a comprehensive data and AI platform, integrating data integration and quality solutions with advanced analytics and AI.

Their services help companies optimize data management, enhancing decision-making and operational efficiency. Qlik’s AI-assisted analytics empower users of all skill levels, facilitating better data understanding and use.

Their tools assist in data quality governance, real-time data movement, and machine learning, supporting clients in various industries to leverage their data effectively.

Databricks

Databricks specializes in AI and data intelligence, offering a platform that integrates data management, real-time analytics, and AI for efficient data processing and insights.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Databricks provides a data intelligence platform, integrating ETL, data ingestion, business intelligence, AI, and governance tools. It helps organizations in efficiently managing and analyzing large volumes of data, aiding in better decision-making.

The platform is designed to simplify complex data processing, ensuring data privacy and control while developing AI applications.

Key benefits include streamlined workflows, enhanced data management, and the ability to drive insights using natural language. Databricks caters to various industries, optimizing operations and accelerating success in data and AI initiatives.

Knowledge Works Logo

Knowledge Works

KnowledgeWorks is dedicated to transforming education through personalized, competency-based approaches and systems change to benefit students and educators.

Interested in becoming a partner? Contact Us Today!

About This Partnership

KnowledgeWorks focuses on reimagining education to ensure all students, regardless of background, can thrive. They provide tools and guidance for personalized, competency-based learning, advocating for policies that support this model.

Their work includes strategic planning, workshops, and resources for educators and policymakers. By fostering student-centered learning environments, they aim to create equitable educational opportunities, preparing students for an evolving world.

Minerva Logo

MinervaCQ

Minerva CQ specializes in AI-enhanced support for contact centers, focusing on customer-agent interaction optimization through real-time assistance, workflow adaptation, and knowledge surfacing.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Minerva CQ revolutionizes customer service in contact centers using AI. Their system analyzes millions of interactions to assist agents in real-time, offering insights, data, and workflow optimization.

This leads to personalized, efficient customer interactions. Key benefits include improved customer experience, reduced handle times, enhanced agent performance, and increased revenue opportunities.

Minerva CQ also focuses on reducing agent onboarding times and optimizing training, making every agent more effective in their role.

Clarteza Logo

Clarteza

Clarteza is an innovation agency specializing in consumer insights and brand strategy, leveraging AI, innovative research methods, and curated technologies to understand and connect with consumers.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Clarteza focuses on driving brand innovation by deeply understanding consumer behavior and needs. They use AI and unique research methods to gather insights and translate these into actionable strategies for brands.

Their services benefit clients by enhancing brand positioning, improving consumer engagement, and guiding product development.

Clarteza’s approach helps brands connect with consumers more effectively, ensuring that their products and services are aligned with consumer expectations and market trends.

CEE Logo

The Centre For Educational Effectiveness

The Center for Educational Effectiveness (CEE) specializes in developing surveys, data tools, and services to support the growth of communities, districts, schools, and individuals. They focus on creating a positive impact in the educational sector since 1999.

Interested in becoming a partner? Contact Us Today!

About This Partnership

CEE partners with over 950 schools in 280 districts, offering services like strategic planning, coaching, professional development, and research projects.

They help educational institutions use data effectively, build strategic plans, improve leadership skills, and review programs objectively.

CEE’s approach centers on understanding and improving school climate and culture, enhancing performance, and promoting continuous improvement.

Realty Check Logo

Reality Check

RealityCheck is a full-service market research firm specializing in advanced qualitative analysis, quantitative research, and integrated qual/quant approaches.

Interested in becoming a partner? Contact Us Today!

About This Partnership

RealityCheck offers deep consumer insights for strategic decision-making in brand strategy, concept testing, and consumer experience mapping.

Their unique approach combines advanced qualitative and quantitative methods, focusing on the critical 10% of new information essential for business growth.

They excel in translating complex data into actionable strategies, aiding companies in understanding and engaging with their customers effectively.

Socratic Technologies Logo

Socratic Technologies

Sotech offers comprehensive research services including product testing, strategy consulting, message testing, and brand health tracking.

Interested in becoming a partner? Contact Us Today!

About This Partnership

Sotech is a leader in concept testing services. Sotech offers comprehensive research services including product testing, strategy consulting, message testing, and brand health tracking. They cater to various industries like consumer products, financial services, restaurants, and technology.

Their approach focuses on collaboration, innovative solutions, and strategic insights to help clients make informed decisions.

Sotech’s expertise in market research and concept testing enables businesses to understand consumer preferences, optimize product development, and enhance brand positioning, thereby ensuring customer satisfaction and market success.

Mckinney Logo

McKinney

McKinney & Company is a multi-discipline planning, design, and construction firm known for its innovation and comprehensive project delivery approach.

Interested in becoming a partner? Contact Us Today!

About This Partnership

McKinney & Company specializes in integrating multiple disciplines like architecture, engineering, and construction management to offer innovative and efficient solutions. With a commitment to collaboration and quality, the firm ensures projects are completed to a high standard, on time, and within budget.

This approach has led to its reputation for handling challenging projects and delivering lasting value, making it a trusted partner for clients seeking comprehensive, high-quality services in planning, design, and construction.

Shapiro+Raj

Shapiro & Raj

Shapiro+Raj is a strategic insights consultancy specializing in social science, data analysis, and creative strategies, with over 60 years of industry experience

Interested in becoming a partner? Contact Us Today!

About This Partnership

Shapiro+Raj is a future-forward insights consultancy recognized as a leading strategic insights firm. They are distinguished for being innovative, having earned a top-25 most innovative company recognition for five consecutive years.

As the largest minority insights company, Shapiro+Raj operates with an integrated team comprising social scientists, data analysts, brand strategists, and creative ideators. Their approach combines social science and behavioral economics, enhanced by a blend of technology and humanity.

The company boasts over six decades of experience in various industries and has contributed to over $100 billion in market cap growth for their clients in the past seven years

Company Name

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

About This Partnership

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.