The dictionary contains exclusively words that are spelled in accordance to the unified spelling system, which is also called sometimes KLTG Breton or again in Breton, Brezhoneg peurunvan for "completely unified" Breton.
This norm is characterized by a general use of the " zh ". For instance, we spell Breizh, unified form of " Breiz " (KLT) and " Breih " (Gw). Two other examples, we spell the word evit when others would spell " ewid " or " evid ", we spell enderv when others would spell " enderw " or " endero " …
Some frequently encountered dialectal forms like àr, teus, meump, … could be present in the dictionary.
The letters of the unified Breton alphabet are the following :
a b ch c'h d e f g h i j k l m n o p r s t u v w y z.
Please note that there is no c (except within the ch and c'h polygraphs), nor q, or x. For obvious reasons c, x and q will not be considered as unknown letters and will not act as separators within words. But it can be asserted that words including one of this letter will not be present in the dictionary.
The accented letters of the unified Breton alphabet are the following :
â à é ê ñ ô ù ü û
Examples of words including accented letters :
lâr ; kêrioù ; àr ; brasañ ; é ; kornôg ; skuizh-ôg ; û ; emroüs ; goût
Obviously, like for any other good spell-checker, a word will be flagged if it does not respect its correct accentuation.
There are five cases :
· The common suffixes :
In this first case, the first word, or more precisely the first group of words without the suffix, is tested. If it does not belong to the dictionary then it is flagged.
· The ez- and ent- prefixes :
· Composed words :
For these second and third cases, the whole word is tested, including its hyphen. The whole word is flagged if not recognized as correct.
· Idiomatic constructions :
This fourth case is correctly handled.
· " Linked " proper nouns :
This fifth case can be problematic.
The apostrophe character is special in Breton, because it belongs to the alphabet. This particularity forces the spell-checking engine to do a special treatment which is not required for other languages such as English, French or Spanish.
There are 3 cases :
· The c'h :
If a word including the c'h trigraph is not present in the dictionary, the whole word will be flagged, like for every other common word.
· The elision :
In the case of elision, the apostrophe is always linked to the first word. " n'on " is composed of the first elided word " n' " and of the whole word " on ". The words " n' " and " on " must be present independently in the dictionary. Every missing word is flagged.
· The contraction :
In this latter case, the behaviour of the spell-checker is more unpredictable and relies actually on the likelihood of finding such forms in the written literature (by the way, such forms are unadvised in written Breton). We will find 'peus et ane'i in the dictionary, but other more obscured or rare forms could not be included.
An important note. Some software can put at the place of the standard apostrophe ', whose ASCII code is 0x06, other kinds of apostrophes, lesser used, like for example the characters 0x60, 0x91, 0x92 and 0xB4. This is always the case when using Microsoft Word in auto-correction mode. It is also the case with Microsoft PowerPoint. An Drouizig Difazier knows these characters and treats them as a familiar apostrophe.
Two cases can be discussed here. On the first hand, the case of accented capitals, and on the second hand the case of polygraphs, that is to say, ch and c’h :
An Drouizig Difazier allows the words Û, HAG-EÑ, Bro-C'hall and Chom.
Mutation of proper nouns can be written, in this case several choices exist. It could perhaps not be as well. As for example, we can find historically in the Breton literature (ex: " Buhez ar Sent ", etc.) the spelling " An Itron Varia ". We can find also the mutation spelled in the following way : " An Itron vMaria ".
It has been chosen to perform the mutation on the first letter of any proper noun, like for any common noun, so the dictionary will include all the proper nouns with all their mutated forms,
There are approximately 350000 words in the dictionary, a basis of 20000 words plus all their very (very) numerous mutated forms.
· verbs. (17,5%)
3500
· common nouns. (65%)
8000 (m.) + 4000 (f.) + 300 (pl.) + 80 (d.)
12380
· adjectives. (13%)
2600
· prepositions. (4,5% including the rest)
· interjections.
· adverbs.
· pronouns.
· proper nouns (Breton surnames, towns from Brittany,…).
· conjunctions
· exclamations
· ordinals
· cardinals
· articles
· contracted forms
900
An overall of about 20000 words,
What makes 350000 different forms.