The History of NLP with the Hebrew Language

1/3/2018 ● 4 minutes to read

The story of the Natural language processing (NLP) begins in the 1950s, although previous work can be found. In 1950, Alan Turing published a famous article entitled "Computing Machinery and Intelligence" [1] which proposes what is now called the Turing test as an intelligence criterion. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation, convincingly enough that the human interlocutor cannot surely distinguish - based on the sole content of the conversation - whether interacts with a program or another real human. Georgetown's experience in 1954 included the fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would no longer be a problem [2].

During the 1960s, SHRDLU, a natural language system called "blocks world" based on relatively small vocabularies, worked extremely well, prompting researchers to be optimistic. However, the real progress was much slower, and after the 1966 ALPAC report, which found that in ten years of research the goals had not been achieved, the ambition was considerably reduced.

ELIZA was a simulation in the style of the Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 to 1966. Using almost no information on human thought or emotion, ELIZA sometimes managed to offer an astonishing semblance of human interaction. When the "patient" exceeded the knowledge base (otherwise very small), ELIZA could provide a generic answer, for example, in response to "I have a headache" saying "How does this manifest itself? ".

During the 1970s many programmers began to write "conceptual ontologies", the purpose of which was to structure information into data understandable by the computer. This is the case of MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), SCRUPULE (Lehnert, 1977), Politics (Carbonell, 1979), Plot Units (Lehnert 1981).

Meanwhile, many ELIZA-style chatterbots have been written as PARADE, Racter, and Jabberwacky. By the 1980s, as computer power grew and became cheaper, statistical models for machine translation became more and more interesting.

Statistical uses of natural language processing rely on stochastic, probabilistic, or simply statistical methods to solve some of the difficulties discussed above, especially those that arise from the fact that very long sentences are highly ambiguous when treated with realistic grammars, allowing thousands or millions of possible analyzes. Disambiguation methods often involve the use of corpora and formalization tools such as Markov models. Statistical TAL includes all quantitative approaches to automated language processing, including modeling, information theory, and linear algebra [3]. The technology for statistical NLP comes mainly from machine learning and data mining, both as they involve learning from data coming from artificial intelligence.

Hebrew took many names: the language of Canaan (a name in the Torah), the Jewish language (the majority of its speakers are Jewish with few Palestinians), and the sacred language (where the Bible is considered sacred because of the descent of the Torah), but the most famous names are Hebrew (named after the Hebrews who dreamed of the language after the Canaanites).

It recorded in Hebrew most of the books of Tanakh, and it was discovered that the Hebrew Tanakh or the so-called Hebrew Old Testament is very similar to ancient languages discovered in excavations, including, the Ammonite language. Ancient Hebrew is the same as the dialect of the Kingdom of Judah, which remained after the demise of the Northern Kingdom of Israel.

Hebrew (עברית) is a Semitic language belonging to the Afro-Asian language group. Currently, modern Hebrew is spoken as the language of speech, literature and official dealings, spoken by more than 7 million people distributed within the borders of Israel and the Palestinian Territories. The Hebrew language represents a multitude of constraints to developers of natural language processing systems because of its particular spelling and rich morphology. A highly advanced software infrastructure, based on linguistic knowledge, is required for natural language applications such as machine translation, speech-to-text conversion, automatic document synthesis, spelling and stylistic verification...

Advanced Natural Language Processing (NLP) technology gives Virtual Assistant the power to naturally interact in Hebrew to provide personalized service to clients. This is the case for MILA, GitHub and many open sources available online like bold360. Customers expect a simpler, faster, and smarter banking experience, and now prefer to type free text instead of searching through the app's options. To meet these expectations, Discount Bank decided to look for an online technology that would add value to the customer and provide valuable information to the bank. But the real challenge was to provide an AI solution that would unrestrictedly include the natural language of customers and translate it into queries or actions on their account, so they can get the information they need in a simple and easy way.

If the language of Israelis today is not that of the Hebrew of yesterday, it must be remembered, however, that the Israeli who remains convinced that he speaks the language of the Bible, if it does is not quite the case, understands it in any case better than a contemporary Greek does not understand Homer. Moreover, Hebrew has finally become a few things, a language like any other, today experiencing a similar evolution to other languages: constant and so-called natural evolution, the discrepancy between certain official rules and certain popular practices, even socialization of the language according to the social categories and creation of a Hebrew slang. In this case, the presence of multiple open sources offering high-performance solutions for the NLP is an urgent necessity in order to put in the market several choices for the customers.

References

  1. Alan Turing, Computing Machinery and Intelligence, 1950
  2. Hutchins, J., Example based machine translation – a review and commentary, 2005
  3. Christopher D. Manning, Foundations of Statistical Natural Language Processing, 1999