You could fry an omelette...
...not on our brains (human dignity prevents this), but on the CPU of our developers computers, as they work on the foundation for our new vocabulary module.
Our brains are running red hot as well, and I don't think I've slept without dreaming of words and numbers for weeks.
The shape this project takes is amazing, and my nose has that great "rocket blast-off" tingle at all times.
Over the next few entries, I'll share with you as I find a bit of time what we're up to. Feedback is very welcome!
Normally, when I explain to my friends what we're up to, their eyes glaze over, because honestly, what our bright LTW engineers are cooking right now is very powerful stuff, and a fairly complex undertaking.
To start with something less abstract than "language" let me give you a brief "stellar" explanation of why I'm so excited about the new LearnThatWord module.
Once upon a time, people would look up at the sky and see a random sprinkling of stars. And air was just invisible nothingness.
Over many thousand years, and through careful observation and analysis, humankind slowly determined that there was an order to the stars:
Certain stars could be seen moving in groups, others seemed to have a certain quality to them that distinguished them.
Later on, we started to understand that we were looking at different systems and spheres, five in total, troposphere being the one closest to us stacked into each other like a Russian doll.
I love this picture, and although I don't know the context it was created for, I see a seeker who managed to break through the core sphere, and who is about to move on to the next. It's the most amazing illustration of "learning" in my mind.
Over time, humankind learned that actually what we call "the universe," is simply a word we use that nobody is actually able to visualize or comprehend. We soothe ourselves by using a term that makes our limitation less obvious by using a singular term for the infinite vastness. However, once we go above the spheres, we're actually looking at an infinite collection of large units called solar systems. Most of us also learned that unless you like to flirt with madness, it is quite enough to concern yourself with our local, hometown universe, since the size and complexity of this one alone will make you nauseated if you try to completely comprehend it.
So, this is how the old astronomers would sketch their astrologic knowledge. Keep this in mind as I make a leap from the stars to the English language, because you will understand what the new quiz will bring better if you visualize it with this structure.
Ok... how this relates to our new module:
Words are not created equal. It's fairly old knowledge that we use some words a lot, and others much less frequently, hence it is more important to know the very common words than the more exotic and obscure ones.
Already in the early part of the last century, people sat down and -- at the time manually -- looked through large amounts of texts, counting words one by one.
These old frequency lists are still quite relevant today, because they only included a few hundred of the top words. There is not much evolution in high frequency words. They're words like "the" (the number 1), "be" (including it's relatives: am, is, are, was, been, etc.), "I," "you," etc.
This is an excerpt from Wikipedia:
So, the biggest advantage is to owning the core words, and it seems as though progress is made rather slowly after the first 1,000 words.
However, to be fluent in a language, you need above 95% of word proficiency. If you are presented with a text of 100 words, not knowing 5 in them is still a high number, and you will need a lot of energy and concentration to make it through a text or conversation at this level. It's kind of like riding a bicycle with a flat tire. You can do it, but it's bumpy and a pain and you won't find it very fun.
Here's another word estimate:
So to reach mastery, you actually need about 15.000-20.000 words, and by words most researchers mean the "word family," so dance, dances, dancing, danced would count as once word (if words were counted more strictly, without combining them into a "root word" or "word family," the number of words you'd need to know would be much, much larger.
There are countless ways to learn the 1,000 core words, because that's where everyone intuitively focuses their energy.
Going beyond, the field quickly thins, and to provide tutoring along the full frequency string now available is possible only for LTW, being the only program designed around a comprehensive vocabulary data set of now 175,000 words (and continuously growing).
And, while the 80% in the table above sound great, you'll find that these 1,000 core words are words that you will naturally pick up rather quickly, they are really very basic. However, to master living language, you need to be able to fill in the more advanced words in synergy with these core words to actually get something out of them. Meaning is most commonly communicated through the more advanced vocabulary, the most specific words.
Take the sentence:
The world is very xxxxxxx.
Do you like your xxxxxx?
What do you think about xxxxxx?
I can't believe it's xxxxxx!
What all of these sentences have in common is that it uses core words for 80% the text volume. Despite this big text volume that's covered by the high frequency words, not knowing 20% makes communication useless!
Try it for yourself:
Take an average, casual text and blank out all the slightly more specific or advanced words. You'll see the 80/20 proportion (or something very similar) appear on your screen, but most likely text will have turned incomprehensible. If the text is more specific, you will be at a complete loss on your primary frequency list.
Good news is that researchers have been letting the big data monsters loose on the English language all over the world, and from all different angles.
It has been chewing away on incomprehensible amounts of data and produced a lot of very valuable data sets, so that now we not only know the top 20,000 word families, but far beyond.
Vocabulary spheres
Using this data and a few important aspects I'll explain in future entries, it is possible to divide the language cosmos into spheres, and to give a scientifically and statistically sound approach to learning English. Once you reach general proficiency, you may choose to expand further into more specialized vocabulary areas.
With our new vocabulary module, you will be able to tell us what your unique focus is: Maybe you really want to focus on spoken language only, plan to prepare for medical school or business communications, are interested to explore humanities or social sciences, or to be on equal verbal turf with lawyers?
Tell our program what you're looking to accomplish and we will prep you accordingly. We don't only have an incredible frequency list, but in addition to that twelve (12) more frequency strains, each for a different learning focus and each extensive.
So with this frequency data, it is possible to break up the language learning progress into a cosmos of different spheres, and determine with relative accuracy how much space you already cover, in terms of vocabulary, and which words you might want to learn next, so they're not too easy or too advanced.
We are currently planning a Kickstarter.com campaign to build a smooth, fun vocabulary assessment that is interactive and allows users to determine their location in the English Word-iverse in a few minutes. If we raise more than what we need, surplus will allow us to serve a larger audience freely.
If you share our passion for learning and would like to wear sponsor laurels, please get in touch.
Frequency data is one of the core pillars of this project, but only one of them. I will post some more of the logic of the new algorithm as we go along, so keep posted...


