Mastering Japanese Vocabulary: How Learning Common Words Leads to Fluency
Last updated: October 28, 2024
So you've decided you want to learn Japanese. That's great!
One of the big things standing between you and fluency is a mountain of vocabulary: Japanese researchers find that the average 20 year old Japanese adult knows ~50,000 words. What a number! Do you need to learn all those words if you want to be fluent in Japanese? That's 10 words a day, every day, for about 14 years.
Fortunately, if that statistic made you gulp, there's a good chance that you're taking the wrong approach to learning Japanese.
To explain why, we'll talk about:
- What is a word?
- Do native speakers know every single word in their language?
- How the frequency approach makes learning Japanese vocabulary 97.116% easier
- If 1,500 common words make up 80% of Japanese... what about all the other words?
- Step-by-step guide: How to learn Japanese vocabulary for beginners
- Tips for effective learning without specialized tools
- FAQs: common questions about the frequency approach
What is a word?
(Yes, I'm being serious.)
If you search for "word" in the dictionary of a Macintosh computer, you'll see the following:
Word: A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence and typically shown with a space on either side when written or printed.
Alternatively, if you were to crack open the most recent version of Webster's New International Dictionary (Unabridged), you'd find some ~500,000 listings.
... but if you had time on your hands and were super curious, like Robin Goulden, Paul Nation, and John Read in 1990, you might begin to wonder what actually counts as a "distinct and meaningful" unit of speech. It turns out that dictionaries contain an incredible amount of redundancy:
- Lemmas — When we reduce eat, eats, ate, eaten, and eating into eat and remove names, ~500,000 words turn into ~248,000
- Base words — When we condense drama, dramatic, dramatically, etc into drama and remove homographs (bow as a verb vs noun), ~214,000 words turn into ~54,000
- Word families — When we condense those base words into groups that include all derivations of a given word, ~54,000 words turn into ~28,000 groups.
And this redundancy actually works in a learner's favor! If you understand:
- Athlete, athletic, athletically
Then you also understand the derived forms of all these words:
- Drama, dramatic, dramatically
- Economy, economic, economically
- History, historic, historically
- Problem, problematic, problematically
- System, systematic, systematically
Meaning you learn 5 words but understand 15.
Now, this doesn't always work perfectly—energy and energetic have quite different usages. Additionally, the studies also filtered out compound words like high school, and we're completely missing multi-word expressions. Nevertheless, it's fair to say that memorizing the dictionary suddenly seems like a more manageable task. 28,000 is a lot better than 500,000.
This good news actually gets better:
Do native speakers know every single word in their language?
Before we start talking about Japanese, take a moment and skim this passage from a white paper about Dynamic Frequency Selection that happens to reveal an important truth about language learning:
The actual power level (EIRP) for LPI APs is not defined in absolute dBm, as for the lower bands, but at 5 dBm/MHz, adding 3 dB for every doubling of channel bandwidth, which gives 18 dBm EIRP for a 20 MHz channel, and up to 27 dBm for a 160 MHz channel. The FCC can apply this rule because incumbent links are generally narrow band compared to Wi-Fi channels. It is advantageous to the Wi-Fi network because background noise increases proportionally with bandwidth, so the SNR for a Wi-Fi receiver will be constant for different channel widths, given maximum transmit power levels.
You see, most of you reading this are native English speakers, and you (like me) have basically no idea what this English excerpt is talking about.
So, why have I asked you to read a technical and completely random paragraph about wireless communication and networking in an article about learning Japanese?
You're a native English speaker, but you don't know all of the words in English. This is a huge thing to understand. You don't need the word "incumbent link" to effortlessly navigate life in English. This word is only important if you happen to work in the field of wireless communication and networking.
The takeaway here is that not all words are equally useful.
How the frequency approach makes learning Japanese vocabulary 97.116% easier
All languages abide by what's known as Zipf's Law (commonly dubbed the 80/20 rule): the first word occurs twice as often as the second, three times as often as the third, four times as often as the fourth, and so forth.
To better show the implications of this, here's a list of all the words that appear in the Japanese subtitles of Netflix. There are 124,000 unique words (used a total of ~110 million times), and here's where things get interesting:
- To recognize 99% of all the words in Netflix's subtitles, you'd need to know 37,247 words
- To recognize 95%, you'd need to know ~12,041 words
- To recognize 90%, you'd need to know ~5,243 words
- To recognize 85%, you'd need to know 2,688 words
- To recognize 80%, you'd need to know 1,442 words
And, to me, there are two super interesting observations to be made here:
- 1,442 words represent 80% of all of the words that occur in Netflix's subtitles
- 87,000 words represent less than 1% of all the words that occur in Netflix's subtitles
So, to answer the question posed in the introduction, no: you don't need to learn the 50,000 words that a Japanese adult knows before you can watch Netflix. You only need to learn ~1,500 words to get started.
Two very important caveats
First, know that you can't just learn any 1,500 words. Frequency is a double edged sword. If you went about this in the worst way possible, you'd end up learning 86,753 vocabulary words before you understood 1% of Netflix's subtitles. That'd be pretty miserable. Since our time and effort is limited, we want to focus on learning the most common words—the ones that are most likely to be used, and thus the most likely to be useful to us.
Second, even if you learn the perfect set of 1,500 words, your job isn't done yet. Here's what it feels like to understand 80% of the words in a text:
“Bingle for help!” you shout. “This loopity is dying!” You put your fingers on her neck. Nothing. Her flid is not weafling. You take out your joople and bingle 119, the emergency number in Japan. There’s no answer! Then you muchy that you have a new befourn assengle. It’s from your gutring, Evie. She hunwres at Tokyo University. You play the assengle. “…if you get this…” Evie says. “…I can’t vickarn now… the important passit is…” Suddenly, she looks around, dingle. “Oh no, they’re here! Cripett… the frib! Wasple them ON THE FRIB!…” BEEP! the assengle parantles. Then you gratoon something behind you…
See what I mean?
You can understand what sort of thing is being said—you're not completely lost—but you don't quite understand what, exactly, is being said, either. Nevertheless, it's a massive improvement over understanding literally nothing.
It wouldn't be an exaggeration to say that this small core vocabulary of ~1,500 frequent words is the foundation you need to start understanding most conversations, shows, and texts.
If 1,500 common words make up 80% of Japanese... what about all the other words?
Remember that paragraph I made you read about wireless networking and communications?
That's all the other words.
There's a core subset of common words—2,000 or so—that occur pretty much everywhere, whether you're watching a slice-of-life anime or reading a brief history of how the appearance of Japanese kanji has changed over time. Then, there's another subset of ~3,000 words that come up regularly enough to notice.
Go much further than that, though, and all the words become much less common. Every type of content you might consume is its own little world, and certain words that are essential to navigate one world may be completely useless to navigate another world.
To show you what I mean, here's the 10 most common Japanese words according to netflix (if we skip the grammatical particles):
- する — "to do"
- ない — "not/doesn't exist"
- いる — "to exist [animate things]/~ing/need"
- —"what"
- ある ""to exist [inanimate objects]"
- —"I [standard/polite word]
- そう — "that way/it seems/like that"
- なる — "become"
- —"person"
- — "I [masculine/informal]
Those first 9 words are über common. They'll probably appear within the first few dozen lines of any frequency list you can find.
But things get interesting with that tenth word. Do me a favor and open this list of the most common words in the Asahi Shinbun newspaper and this list of the most common words on Wikipedia.
Notice something? The word 俺 is:
- The 10th most common word on Netflix
- The 2522nd most common word on Wikipedia
- Outside of the top 10,000 most common words in the newspaper
What gives?
The word 私 and 俺 both mean "I". While 私 is a more polite term, 俺 is informal and can be seen as rude in some contexts. In real life, most adult men refer to themselves as 俺.
With this in mind:
- It's no wonder that 俺 appears so commonly on Netflix—it's the primary pronoun used by men in speech
- It's no wonder that 俺 rarely appears in the newspaper—it would be inappropriate for such a formal register
In other words, simply changing the content medium has taken 俺 from being one of Japanese's top ten words and turned it into something you could pretty safely skip.
The point
The good and bad news about vocabulary is that this kind of thing happens constantly. If you're reading a book on fiscal policy, there will be a subset of words that occur disproportionately often in articles about fiscal policy... and rarely, if ever, occur elsewhere.
- This is good news if you are interested in fiscal policy. As you read your first several articles, you'll pick up the key topic-specific vocabulary words, and your reading comprehension will increase quite quickly.
- This is troublesome news if you read quite widely. Every time you change mediums, genre, topic, and even author you'll have to overcome a learning curve: new topic-specific vocabulary, new sentence structures, and a new style of writing.
The key takeaway here is that you are much closer to being able to do anything in Japanese than you are to being able to do everything in Japanese. The fastest way to become capable of doing something interesting in Japanese is to determine what "your thing" is, and then make sure that all of your efforts bring you closer to that goal.
Step-by-step guide: How to learn Japanese vocabulary for beginners
1. Learn the first ~1,500 most common words via spaced repetition with Migaku Academy
If you're new to Japanese, we recommend that you initially focus entirely on learning the 1,500 most common Japanese words. This will take Japanese from being a completely foreign language and make it something that appears quite familiar to you: in any random sentence you see, you'll recognize most of the words. You won't be fluent, but Japanese will suddenly seem accessible.
We've actually prepared a flashcard-based course that will:
- Teach you the ~1,800 most common words on Netflix
- Teach you the ~400 most common grammar points
You can see a sample of it in that YouTube video, but it's pretty cool. Every flashcard includes a vocab word and example sentence, an image, and an audio recording from a native Japanese speaker. Best of all, we've very carefully curated the sentences so that each "next" sentence contains only one piece of new info
If you don't yet know how to read Japanese, we have a prep course for Migaku Academy (called Migaku Japanese Fundamentals) that will teach you everything you need to know to start it.
2. Use Migaku's Chrome extension to work towards 5,000 words
Now that you've got a foundation under you, you're ready to leave textbooks behind and get into real Japanese content. Pick anything you're interested in. Migaku's Chrome Extension will support you in a few main ways:
- Interactive text —— Any text on a webpage (subtitles, paragraphs, etc.) becomes interactive. Click on any word to see a dictionary definition, an AI breakdown of how the word fits into this particular sentence, and more.
- Recommended vocabulary —Migaku will tell you if a word is among the most frequently occurring 2,500 or 5,000 Japanese words. If you stumble into such a word in a useful looking sentence, you can click a "flashcard" button to automatically make a flashcard which includes a screenshot of your show, a snippet of the audio, the target word, and the sentence where the target word came from.
- Automatically make flashcards — Flashcards you make get sent to Migaku Memory (or Anki), where a spaced repetition algorithm will build a personalized learning schedule for you that constantly adjusts based on your performance, ensuring that you remember any word you make a flashcard for
We recommend focusing mostly on the top 5,000 vocabulary words for now, but of course you can (and should) make flashcards of terms that appear frequently in the content you enjoy, or anything else you think would be good to remember.
3. Follow your interest to learn the rest of the words
5,000 words is a pretty significant milestone. You now recognize ~90% of the words you see, which means that you'll be able to follow most types of Japanese content, including shows, books, and conversations, so long as you know the key words.
As discussed in this research paper (Laufer 2010), rarer words tend to be more information dense. In other words, if you don't understand a sentence, it's often going to be because you're missing that one infrequently-appearing key word. For example, consider a sentence like "_I put your bag on the ___": you know 6 of the sentence's 7 words, but lack the most important piece of information, which is where the thing actually _is. As you're overcoming the intermediate hurdle, you'll run into a lot of sentences like this.
For this reason, it's extremely important to follow your interests. You are a unique person with specific interests and goals, and this has dramatic implications on what your vocabulary pool will end up looking like. If you're a gearhead into Yamaha motorcycles, 50 terms about mechanical tools and motorcycle parts will do you better than 1,000 random terms from other niches.
This stage is long, but it's also a ton of fun. There's still a lot you don't know, but you now have the skills to start filling in those blanks. Consuming Japanese content becomes an enjoyable activity in itself, so all you're really doing is going along for the ride, picking up useful words and expanding your mental database of Japanese sentences.
As your interests take you from point A to point B, you'll eventually cover all of the words and structures you need to effortlessly do the things that matter to you.
Tips for effective learning without specialized tools
So, I get it. Technology can be a headache, and you might not be at a point where you're ready to drop a subscription on a Japanese learning tool yet. That's OK.
I personally didn't use any fancy tools when I learned Japanese. I'm not especially good with technology, and I really like the feel of paper books; there's just something about sitting on a bench next to a lake and reading, you know? (Plus, I started learning Japanese in 2014, and a lot of the cool tools available now didn't exist back then.)
So, here's how I learned Japanese vocabulary:
- I picked a beginner Japanese textbook and gave it about 15 minutes per day, gradually learning grammar points over time
- I downloaded a dictionary on my phone (ctrl+f this post for miscellaneous resources to see my recommendations)
- I learned the first ~2,000 words via a free vocabulary deck on Anki's public Japanese flashcards page
- I'd download book samples or skim the first pages of physical books, selecting my reading material according to two criteria: (a) there had to be a max of ~5 unknown words per page, which was simply the point where lookups became annoying to me, and (b) I had to find the book at least somewhat interesting
- I read over 100 books
I didn't make flashcards or anything like that for a long time. Instead:
- I looked up unknown words as I encountered them, figuring that I would eventually learn them from exposure alone if they were really that important
- If I looked up a word and found it useful, I wrote it and its definition (in Japanese) in the margin of the book, figuring that the additional attention would help me remember it
And that worked for me! I passed the JLPT N1, at least, and today I can just pick up and read most things I'm interested in. We learn languages by getting input—consuming native content that's interesting to us and understanding the messages it contains—and you can do that without using specialized tools or spending money.
So long as you're using Japanese to do something you enjoy, and understanding some of what you see or hear, you'll make progress.
Challenges with my low-tech approach to learning
I ran into two main hurdles with my approach to learning.
- Randomness — I used the Core 6K, a flashcard deck based on the words that appear most commonly in the Asahi Shinbun newspaper, but I was interested in reading YA horror short stories by Otsuichi and light novels from the Kino no Tabi series. A lot of the things I learned in that vocabulary deck didn't help me to read the things I was interested in.
- Manual word lookups — Multiple times per page, I'd get stuck on a new word or kanji, stop to put my book down, open my dictionary app, try to draw the kanji I saw, forget how it was drawn, check my book again, look the character up, then scroll through list of options till I found the word I was looking for. It took a solid 30 seconds per lookup, and that adds up... especially when you're in a good part of the story.
Both of those problems are eliminated by Migaku, so if I could go back in time, I wouldn't start with physical books like I did. But you could, and it's possible.
FAQs: common questions about the frequency approach
Q: What is the frequency approach in language learning?
The frequency approach involves focusing on the most commonly used words first, enabling you to quickly reach a point where you can understand and engage with the language.
Q: Can I use the frequency approach without Migaku?
Yes! While Migaku simplifies the process with spaced repetition and frequency scores, you can still manually apply the method by using frequency lists and creating your own flashcards.
Q: Should I go out of my way to learn kanji?
We have an entire article about learning kanji. You may find a link to it at the bottom of this post. You'll likely need to learn a couple hundred kanji to get a feel for how they work, but as much as possible, we recommend learning kanji through vocabulary words, instead of learning kanji in isolation. An immediate benefit of this is that it ensures you only learn the kanji you need, rather than spending time learning characters you may never actually encounter.
In Closing: Why Frequency Matters for Japanese Fluency
However you end up going about vocabulary, remember that vocabulary is a means to an end. We really have two goals here:
- Understand most of the words you hear, so that you can more effortlessly understand the content you are consuming
- Make out the words you don't understand, so that even if you do miss something, you can easily look it up and resolve your confusion
You'll eventually hit both of these milestones, after you learn enough of the vocabulary that occurs in the content you regularly consume. Migaku's frequency approach ensures that you hit these milestones faster by guiding you through the most common Japanese words and then providing you the tools you need to make flashcards out of (and thus learn) the useful words you come across.
P.S. — If you're interested in this topic, I highly recommend looking through this meta analysis from 2016.