As we approach the end of an extraordinary year, Luminoso’s world-leading natural language understanding (NLU) and machine learning (ML) experts shared their candid thoughts on emerging AI uses and advancements.
Whether you’re a business leader interested in AI, or a tech enthusiast seeking an inside look from some of the brightest minds on the evolution of AI, I hope you enjoy the discussion with our research team on 2021 AI predictions.
Q: What advancements could we see in the field of natural language understanding over the next year?
Robyn Speer, Chief Science Officer:
The most rapid progress in the field is in the area of natural language generation (NLG). Just a few years ago, my position on NLG was that it was impractical. You wouldn’t rely on it to generate a complete sentence, and its overall coherence was so low that it could only be used as absurdist comedy. If you needed to generate a sentence, the only practical thing to do was to fill in slots in a template.
There have been some dramatic improvements in NLG techniques recently – advanced language models such as GPT-3 produce not only fluent sentences, but fluent multi-paragraph articles given a prompt. NLG is ready to fill many application niches that have been waiting for it.
Fluent NLG comes with new concerns, because it can be used for deception. When we read fluently written text, we assume that a person took the time to write it, and we need to culturally adjust to the idea that it might have been written by an algorithm. Deceptive NLG could be used to push fake reporting and inauthentic opinions. On the bright side, some research groups are already confronting this and creating tools that can detect the probabilistic “signature” of generated text.
So far, even in these advanced NLG systems, the text it generates can betray a lack of common sense, and the topic can wander. I expect that in 2021 we’ll see NLG that is more informed by common-sense understanding, based on research that’s already going on with incorporating knowledge graphs into language models. We’ll also see ways to verify that the generated text conveys the information it is meant to convey, and isn’t getting off topic.
Fluent NLG can be used to improve user experiences, to produce good, readable text in situations where people previously had to tolerate clunky, templated, computer-y text. We’ll also see a growth in creative uses of NLG, such as human-computer collaborative storytelling.
Q: How will natural language understanding be used in the field of COVID-19 research in 2021?
Lance Nathan, Senior Linguistics Developer:
I imagine two things that might happen:
AI researchers will try to use deep learning techniques to do epidemiology, because when a lightbulb goes out in their houses, they apply deep learning techniques to find a solution. This is a bad idea, because epidemiologists know how epidemiology works, and AI researchers do not, and it rarely goes well when an expert in one field says, "Oh, I can just apply my field to their field and it'll work out fine."
AI researchers will try to use deep learning techniques on NLP instead of epidemiology, which is a much better idea, in theory. Unfortunately, your results can only ever be as good as your input, and the kind of data researchers will have is "what are people saying on Twitter", and, well, when I learned the phrase "garbage in, garbage out" in the 1980s, it was already twenty years old.
I hope neither of these happens. I'm not optimistic.
Joanna Lowry-Duda, Machine Learning Research Scientist:
I have a slightly more optimistic view. I believe machine translation (which is not a part of NLU, but a part of natural language processing, or NLP) could be used in any health crisis to provide up-to-date information on local restrictions and health guidelines, especially to speakers of non-official languages. The problem is that available machine translation solutions frequently don’t work in low-resource languages, or are not accurate enough to be used in a way that encourages confidence. I hope that the global response to COVID-19 will result in more attention being given to low-resource languages in the NLP community.
Arbin Timilsina, Machine Learning Research Scientist:
In the long run, I believe, ML will be able to assist human experts in such research. However, we cannot expect that parsing literature on a subject and feeding it to an ML engine will give us the answers we desperately need.
Q: What novel use cases of AI could emerge in 2021?
Speer: “May your 2021 be filled with novel business use cases of AI” sounds like a curse to me. Have you seen the world these use cases would be deployed into? I wish there were something that would convince businesses to use the AI they already have more responsibly.
When I think of non-bleak new use cases I hope to see from AI, I think of its possibilities for art and creativity.
Timilsina: In my opinion, medical AI. As generation and access to biomedical data has increased in recent years, AI applications in healthcare systems have made tremendous progress to solve clinical problems. Such AI applications can benefit both patients and doctors in various aspects such as virtual healthcare, screening, disease diagnosis, drug interactions, etc. However, I feel like any experiment or research with medical AI should be performed in such a way that it is transparent and reliable.
Q: Do you think we’ll see a trend of businesses doing more to ensure that the AI they’re using isn’t being trained on biased data?
Speer: I really hope business will do more to fight AI bias in all its forms. If only it could be as simple as “not training on biased data”. But where is unbiased data going to come from? Any data that you collect in quantity reflects the biases of the world we live in. I recently made this graph as a simple example, and discussed it in a Twitter thread:
I see four steps to fighting AI bias that happen at different stages of machine learning:
Knowing the biases of our source data and how to account for them
Applying de-biasing techniques, when appropriate, to counteract the ways that biases get baked into intermediate representations
Ensuring that the results of machine learning are used in ways that are fair and transparent
Being responsive and accountable in cases where the system turns out to have flaws or unintended consequences
It’s wishful thinking to think that all businesses will do more on this front. We need awareness and advocacy, but we also need regulations in cases where transparency and accountability are contrary to the design – such as face recognition in the surveillance industry.
Nathan: I'm very much with Robyn when it comes to the question of where businesses will get unbiased data. Her example involves public corpora, where using them to train your AI will introduce gender biases. From what I gather, this can be just as much, if not more, of a problem with proprietary data, often data that's less about general NLP and more about particular use cases.
If a business has been collecting data about their users, but because of systemic biases their users have been homogenous, "getting unbiased data" won't just be a matter of finding another source. They'll want to use their own data, and they may not know how to correct for its biases themselves. They may decide, to save money or to reduce turnaround time, that they don't need to hire consultants or new employees who have expertise in that kind of correction. And, of course, the same systemic biases that created bias in their data may keep them from noticing that their data is biased at all, and they won't realize they need to account for it. I hope that more people – and more businesses – are becoming aware of their own subconscious prejudices and the importance of not letting them get incorporated into a product or an analysis, but it's not something I'm going to count on.
Timilsina: I think ensuring that the AI they're using isn't being trained on biased data is just one part of the solution. I would love to see a trend where businesses include fairness and inclusion as part of the model's optimization goal, include metrics to examine the performance on such a goal, and check/test the model for unfair biases before deployment.
Q: Are there any other predictions about AI that you’d like to make for 2021?
Lowry-Duda: I think that a trend of democratization of AI, that we’ve seen picking up recently, will be ever more prevalent. There’ll be more toolkits, pretrained models, and datasets available for general consumption. I hope that this will result in more understanding, specifically from business users, of what problems ML can solve, and how well ... but also what its limitations are.