FAQs

Why is there a 1.2 Version of the NGSL? Couldn't you get it right the first time?

Like the original GSL which was released to the public in 1936 as an interim list, one that was revised and refined for more than 17 years before being published as the GSL in 1953, so too, our NGSL lists should be seen as ones in their interim stages, released to the public in evolving versions and through various venues including conferences, research papers, the web and social media with the hope that the list will be used, discussed, debated and improved over time. All of our lists (including the NAWL, TSL and BSL) should be considered as interim lists and we welcome your input comments and suggestions on how to improve them. Our list naming conventions are as follows:

NGSL 1.0 first major analysis of corpus and interim frequencies (2013-2016)

NGSL 1.01 minor errors corrected (2016-2022)

NGSL 1.2 substantial revision to list based on feedback and published papers (2023~)

As can be seen from above, the NGSL 1.2 is the most current version of the NGSL. See the chart to the right for the specific additions/changes from the previous NGSL 1.01.

Why do you sometimes list lower frequency words such as “unauthorize" as the headword rather than the more common form such as “unauthorized”?

We try to choose the canonical headword that is found in the dictionary, though for our teaching lists we have been less rigerous in applying this rule, sometimes going against the rule when there are strong objections from the public for listing the lower frequency canonical headword first.

Why didn't you use Word Families like the original GSL?

It’s important to remember how the original GSL counted words. The GSL did not amalgamate frequency counts for derived forms, but it did combine the frequencies for word forms regardless of part of speech. For example, the frequency count for both the noun and verb forms of CARE are summed, while the frequency counts for the derived forms CAREFUL and CARELESS are listed separately.

Following the publication of Bauer and Nation’s Word Families (1993), the number of words included under the headword expanded greatly. They defined a word family as, “a word family consists of a base word and all its derived and inflected forms that can be understood by a learner without having to learn each form separately” (p. 253). For example, CARE under the word family rubric contains, along with the inflections of the verb and noun, the following, CARE, CAREFUL, CAREFULLY, CAREFULNESS, CARELESS, CARELESSLY, CARELESSNESS, CARER, CARERS, UNCARED, and UNCARING. However, the assumption that the form “can be understood by a learner without having to learn each form separately” has been called into question. Research by Schmitt and Zimmerman (2002) “did not support a strong facilitative effect for knowledge of words within a word family” (p. 158). Another problem with determining which words would be included under the headwords using the operationalization of the Word Family concept was suggested by Gardner (2007) who wrote “case by case assessments of affixed word forms would be necessary to determine if a prolific derivational affix was acting transparently or not” (p. 247). This of course adds a level of subjectivity to the compilation of the word list and an avenue to list differentiation, resulting in difficulty in interpreting coverage statistics reported for a variant word list going under the same name, such as is the case with the current GSL coverage claims coming from substantially different word lists.

Why are unusual lemmas like WINDOWING and WHILES included as part of the headwords WINDOW and WHILE:

Word lists are created in different ways and for different purposes, and what is or isn't included in a list really depends on the final purpose. Although the version of the NGSL which you will see on either the free Quizlet flashcard program or the free NGSL with definitions in easy English file contains only the headword since the purpose is teaching, you may notice that the main NGSL list includes not only the headword but also a wide range of its associated lemmas, including several that may seem strange or unusual. This is because one of the other purposes of the NGSL was to be useful to researchers who are analyzing real world texts to identify the frequency of words in order to predict the probability of the reader encountering the lemma. When faced with making the word set for a given head word, one can use evidence or arbitrarily imposed rules. For example, when making the 1995 revised version of the GSL, Bauman and Culligan chose an evidence-based approach. If the derived form did not appear in the Brown Corpus, they did not include it. This resulted in the exclusion of many legitimate derived forms.

For the NGSL, we wanted to address two primary tasks. First we wanted to predict the probability of the reader encountering the lemma. To do so, our lists were used to analyze real world texts to identify the frequency of words. Second, we wanted to identify unique lemmas that were not on our word list. In Probably Theory there is a thing called an event space. Basically it's the set of all possible ways an event can happen, both frequent as well as rare events. Once the parameters of the event space is defined, only those words are permissible. It may sound logical to decide that only high frequency events be included in the list but what does a researcher do when a rare event occurs? Do they ignore the event and maintain the event space or do they update the event space? More concretely, what should researchers do when they encounter words that clearly belong to a level 6 affix family, but are not on the word list? Should they ignore it and pretend it is a unique occurrence, or add it, thus changing the list? We have chosen the latter, evidence-based approach, including lemmas with even a very low or no occurrence in the main list so that researchers who are doing corpus research with the NGSL using analytical tools such as VocabProfiler and AntWordProfiler can explore questions and issue beyond what the typical EFL learner or teacher might be interested in. English is an incredibly flexible language with words shifting parts of speech with ease, as Susanna Centlivre showed in 1709 with the phrase, "But me no buts." We chose a rule-based and completeness approach.

Why do you sometimes list lower frequency words such as “unauthorize" as the headword rather than the more common form such as “unauthorized”?

We try to choose the canonical headword that is found in the dictionary, though for our teaching lists we have been less rigerous in applying this rule, sometimes going against the rule when there are strong objections from the public for listing the lower frequency canonical headword first.

What word lists are next? Can you develop word lists and learning tools for our institution or company?

We are always working on new word lists and with the recent release of the NGSL 1.2, and Fitness (FWL) and Children’s (NDL) word lists we are now turning our attention to developing several new word lists including ones that focus on TOEFL English, TV English and English for Literature majors. Like all of our other public lists, these new lists will be made available for free via the least restrictive Creative Commons License along with our many free learning tools. We also provide consulting services via Charlie Browne Company, which among other things, can quickly create specialised wordlists, learning tools and apps as well as learner dictionary databases for schools and online learning companies.

Why didn't you include the numbers, days of the week and months of the year?

Although these word sets were excluded from the NGSL proper in the same way they were excluded from the original GSL, they are actually included as an appendix in the main NGSL Excel file. Though pulling these words out had a negative effect on our coverage figures, it seemed to be the right decision from a pedagogic point of view. In the case of days of the week and months of the year it was consistent with our decision (and most corpus-derived vocabulary lists) not to include proper nouns. Furthermore, keeping them in would have caused another kind of problem since not all items of each lexical set occurred at a high enough frequency to appear on the NGSL list even within the 273 million sample of the CEC corpus used for this project.

Why didn't you include the letters of the alphabet in the NGSL?

For similar reasons that we don't have numbers on the list. The alphabet by itself is used semiotically (semiotics is the stuff of signs and symbols as used in communication), often as place holders like numbers or bullets in a list. They are often used in sequences or stand in as variables in formulas. While they are of interest in the field of semiotics, they cannot be classed as words, but are more often used in the same way as smiley faces or other emoticons.

Our definition of a word concurs with one of the definitions found in Wikipedia:

A distinct unit of language (sounds in speech or written letters) with a particular meaning, composed of one or more morphemes, and also of one or more phonemes that determine its sound pattern.

This definition would preclude, for example, the letter B since it is not a morpheme.

How do you cite or refer to NGSL wordlists in academic papers?

Though the proper style sheet will vary from academic field to academic field, most people working in the areas of applied linguistics and TESOL use APA format. Borrowing from their rules for citing websites, we recommend the following, using the NGSL 1.2 as an example::

Browne, C., Culligan, B. & Phillips, J. (2013). The New General Service List. Retrieved from http://www.newgeneralservicelist.org.

There are many academic papers on NGSL wordlists. These of course should be cited according to APA format. For example:

Browne, C. (2014), "The New General Service List Version 1.01: Getting Better All the Time”, Korea TESOL Journal, KOTESOL, 11:1, 35-50.