A Free Chinese Dictionary, and Why I Built One

Introducing the Moya Chinese dictionary — 120,000+ words, sentence parsing, and a small blog about the long tail of Mandarin.

A few weeks ago I launched the Moya Chinese iOS app, the spaced-repetition Chinese flashcards app that, eleven years after my first attempt, finally seems to be working for me. I'm on a 50-day streak, and I can barely believe it! Since the initial launch, I've released several more versions with lots of improvements, including listening and character writing modes and an integrated dictionary.

And today I'm launching its companion: a free web dictionary at moyachinese.com.

If you're excited about the idea and just want to try it, great, please go ahead! Or if you want to know why a perfectly good world full of perfectly good Chinese dictionaries needed yet another one, read on.

Yet Another Chinese Dictionary?

I love a good Chinese dictionary. MDBG, Pleco, LINE Dict — I've used them all, and they're all wonderful in different ways. So I want to be upfront about it: the Moya Chinese dictionary is standing on the shoulders of giants, and I owe a lot to MDBG in particular.

But here is what Moya Chinese does differently:

Clean design. As much as I've loved using MDBG over the years, its design really felt like it was leaving a lot to be desired, and I don't think it's been updated in the 15 years I've been using it. With the Moya Chinese dictionary I wanted to address this, first and foremost.
Fine-tuned search algorithm. It's hard to create a dictionary search that works the way users expect, and often with other dictionaries I couldn't find what I was looking for. So I put a lot of thought into how to do this for Moya. By combining factors like how common the word is, its HSK level, and how well it matches the search term, I think I've landed on something that works well most of the time.
120,000+ words, with pinyin, English, and a breakdown of each character's components.
Search in Chinese, pinyin, or English. No need to switch input methods to look something up.
Paste a whole sentence and it will break the sentence down into words and translate each. Useful when you're squinting at a Chinese subtitle and the dictionary lookup is happening in your head one syllable at a time.
Example sentences for every word in the HSK and TV/Movie frequency lists (sourced from Tatoeba)
Interface in English, French, Spanish, Japanese, or Korean. Same as the iOS app.
Free, no ads, no account. I just wanted to provide something useful to the community. Bookmark it, and use it when you want to look up a word.

That's roughly it. The design goal was the same as for the iOS app: useful, clean, no nonsense. And the plan is to (soon) integrate the two, so that when you look up a word in the dictionary, you can easily add it to your flashcard study queue.

The Long Tail, Revisited

In my post about the iOS app, I related how I built a website called ChineseLevel on the premise that knowing the top 1,000 most common words would let you understand 95% of a newspaper. This is a tempting and very wrong idea, and I now believe it has done some damage to a lot of beginners (sorry).

It's tempting because the underlying math really does work. The first ~590 Chinese characters cover about 80% of running text, ~940 cover 90%, and ~2,400 cover 99% (Chinese character frequency, Wikipedia). You can see the curve flatten out quite dramatically:

Cumulative coverage of Chinese text by the N most-common characters, showing the long tail

The shape of that curve is roughly Zipfian, the same distribution that shows up in word frequencies in pretty much every language ever measured (SUBTLEX-CH, Cai & Brysbaert, PLOS ONE 2010). And it really is true that if you know the top 940 characters, you'll recognise 9 out of every 10 characters on a page.

The problem is that the one in ten you don't recognise is, almost without fail, the character that carries the actual information. The rare word "indictment" matters more to a news article than the next thousand instances of "the". A linguistic version of the Pareto principle seems to apply here: a small fraction of the words do most of the work, but a different small fraction of the words carries most of the meaning.

This is why a 120,000-word long-tail dictionary is, in my opinion, exactly the kind of dictionary a learner needs by their side. The first 2,000 words you can get from your textbook. It's the next 100,000 that decide whether you can read a novel or follow a TV series.

The HSK That Was, and the HSK That Will Be

If you're learning Mandarin in any structured way, sooner or later you're going to bump into the HSK (汉语水平考试), the standard Chinese proficiency test. The current syllabus, HSK 2.0, has six levels and tops out at around 5,000 words. The new HSK 3.0 standard, published in November 2025 and effective from July 2026, keeps the lower levels but extends the ladder all the way to 11,000 words across nine levels:

Cumulative HSK vocabulary by level, comparing HSK 2.0 to HSK 3.0

That's more than doubling the official upper bound, and the new levels 7–9 are designed to take learners well into the long tail. I think this is great.

But even the new HSK doesn't quite plug all the gaps. A recent post on the Moya Chinese blog, The Words HSK Forgot, is about exactly this: I crunched subtitle frequency data against the HSK lists and found that roughly half of the most common words in Chinese media don't appear in HSK at all. Words like 女孩 (girl), 男孩 (boy), 加油 (the universal cheer), 之前 (before), 之后 (after) — all conspicuously absent. The HSK and the subtitle corpus simply have different goals: one optimises for cultural breadth and exam-fairness, the other reflects what people actually say to each other. HSK 3.0 closes the gap a lot — top-1,000-word coverage jumps from 48% to 76% — but a gap remains.

Robots, Buddies, and Other Joyful Compounds

Long-tail vocabulary is also where Chinese is at its most fun. The other post I want to point you to is Machine Person: All the Ways to Say "Robot" in Chinese. The standard mainland word for "robot" is 机器人 jīqìrén, literally machine person. In Hong Kong, it's 机械人 jīxièrén, mechanical person. A humanoid robot is a 人形机器人, a human-shaped machine person, which is delightful.

English borrowed "robot" from the Czech robota (forced labour), via Karel Čapek's 1920 play R.U.R. (Wikipedia: Robot). Chinese, characteristically, just describes the thing. The same instinct gives us 飞机 (flying machine = airplane), 手机 (hand machine = mobile phone), and ancient automatons called 木人 (wooden person). Once you start seeing the pattern, every new word feels a bit like solving a tiny cryptic crossword clue.

There are a few more posts up already, including one on why spaced repetition works for Chinese and one introducing Word Snake, a small puzzle game where you chain HSK words across an 8×8 grid of characters. I'm planning to write more along these lines — short, opinionated, hopefully useful, with a leaning towards the kind of vocabulary the textbooks skipped.

Try It

The dictionary is at moyachinese.com. The iOS app is on the App Store. Both are free. If you give either of them a try, I'd genuinely love to hear what you think: what works, what doesn't, what words you wish were there. Send me a note, and I'll keep tinkering.

再见 zàijiàn!