INSTANT VISUAL TRANSLATION USING RECURRENT NEURAL NETWORK MODEL
issue 2

INSTANT VISUAL TRANSLATION USING RECURRENT NEURAL NETWORK MODEL

Dr. Mrs. Radha Pimpale

Asst. Prof, Priyadarshini Bhagwati College of Engineering, Nagpur

Information Technology Department, Nagpur

radhapimpale@gmail.com

Introduction:

Person travelling in different country or different state unknown about language of that country or state need sometimes help to understand board of specific language. Many times they need assistance because they are unable to enter destination due to language barriers. For Conversion of text into specific language text, many machine learning and deep learning techniques are use. Computers Translate human language but there are several issues with it is that human language does not obey a standardized set of rules full of special cases, geographical variations, and just flat rule breaking. In order to resolve this, we can make statistical translation systems useful for language translation as well, they can be built using probability-based models. Instant visual translation is solution, which can recognize images which contain letters and digits, they can be transformed into text, image can recreated with the given translated text. For converting text image into specific text image, we introduce two machine learning algorithms. In this paper we explore many of the benefits and disadvantages of algorithms Feed Forward Neural Net Language Model (NNLM) and Recurrent Neural Network (RNNLM).

 Techniques Used

Two techniques are use used for converting text Encoding and Model Architecture

Encodings

We can come up with an encoding that represents every possible different sentence as a series of unique numbers. For Encoding following steps follow Word Embeddings or Word vectorization, this techniques convert the string/text into a set of real numbers (a vector) Word embeddings techniques Compute similar words, Classify Text, clustering/grouping of documents, to extract the feature and do natural language processing. After converting word into vector method used to find similarity between words by using techniques cosine similarity and Euclidean distance. Word embeddings coming from pre-trained methods such as, Word2Vec — From Google,Fasttext — From Facebook,Glove — From Standford

Model Architecture

The following model architectures for word representations’ objectives are to maximize the accuracy and minimize the computation complexity.

 The models are

  • FeedForward Neural Net Language Model (NNLM)
  • Recurrent Neural Network (RNNLM)

FeedForward Neural Network Language Model

Schwenk [10], in his paper, proposed to use a feed-forward neural network to score phrase pairs. He used a feed-forward neural network with fixed size inputs consisting of seven words, which has zero padding for shorter phrases. The system also had fixed-size output consisting of seven words for the output.

Bengio et.al.0(2003) applied a feedforward neural network (FNN) to a training set consisting of a sequence of words, showing how the neural model could simultaneously learn the probability that a certain word will appear next after a given sequence of words while at the same time learning a real valued vector representation for each word in a predefined vocabulary[6].

Therefore the FNN model learned a proper set of characteristics when studying how to predict the next word in a sentence. The decision tree and maximum entropy language models, on the other hand, usually allowed the features to be manually created before any model could be trained (Heeman 1999, Peters and Klakow 1999).

FeedForward Neural Network Language Model is to predict the fourth word based on the previous three.

The softmax activation function is used. Mathematical representation for this predictor is as follow:

where x is the concatenation of the input word feature vectors, W represents the direct connections between the input layer and the output layer and is optionally zero (no direct connections).

Softmax function to normalize the logit score, If there are only limit number of candidate words, one can save even more time. For example, the candidates can be limited in the set of possible words suggested by the trigram model.

Disadvantage of Using FeedForward Neural Net Language Model

  1. Due to vocabulary size growing large, NPLM becomes very slow to train and test.
  2. It is very time consuming because Computing the probability of the next word requires normalizing over all words in the vocabulary so calculating exact gradient needs to do this computation repeatedly to update the model parameters iteratively.

Recurrent Neural Network Model

Human Language is complicated pattern, RNN is Sequence to Sequence to translation. RNNs are useful to learn pattern in data. RNNs are designed to take text sequences as inputs, or to return text sequences as outputs, or both. They are called recurrent since the hidden layers of the network have a loop in which the output and cell state will become inputs at the next time step from each time step. The recurrence serves as a memory process.

In Recurrent neural network, sequence input and sequence output  (e.g. Machine Translation: an RNN reads a sentence in English and then outputs )

Algorithmic Steps for Processing

  1. Preprocessing: load and examine data, cleaning, tokenization, padding
  2. Modeling: build, train, and test the model
  3. Prediction: generate specific translations of English to Marathi, and compare the output translations to the ground truth translations
  4. Iteration: iterate on the model, experimenting with different architectures

For Experimental Work Frameworks used here is Keras for the frontend and TensorFlow for the backend. Preprocessing Steps Involved

1. Load & Examine Data

The inputs are in English and output are translation in Marathi

2. Cleaning

Convert data into lowercase and split spaces between all words and punctuation

3. Tokenization

convert the text to numerical values known as tokenize the data i.e., This allows the neural network to perform operations on the input data. Assign a unique ID each word and punctuation. When we run the tokenizer, it creates a word index, which is then used to convert each sentence to a vector.

4. Padding

Each sequence need to same length, for this padding techniques is used to equalize sentence

5.Modelling

The architecture of an RNN at a high level. Referring to the diagram above, there are a few parts of the model we to be aware of:

  1. Inputs. Input sequences such as one word for every time step. Each word is encoded as a unique integer maps to the dataset vocabulary.
  2. Embedding Layers. Embeddings are used to convert each word to a vector. The size of the vector depends on the complexity of the vocabulary.
  3. Recurrent Layers (Encoder). This is where the context from word vectors in previous time steps is applied to the current word vector.
  4. Dense Layers (Decoder). These are typical fully connected layers used to decode the encoded input into the correct translation sequence.
  5. Outputs. sequence of integers are the outputs mapped to the Marathi dataset vocabulary.

Embeddings

Embeddings allow us to capture more precise syntactic and semantic word relationships. This is achieved by projecting each word into n-dimensional space. Words with similar meanings occupy similar regions of this space; the closer two words are, the more similar they are. And often the vectors between words represent useful relationships.

Training embeddings on a large dataset from scratch requires a huge amount of data and computation. Used embeddings package such as GloVe or word2vec.

Encoder & Decoder

Our sequence-to-sequence model links two recurrent networks: an encoder and decoder. The encoder summarizes the input into a context variable, also called the state. This context is then decoded and the output sequence is generated. Since both the encoder and decoder are recurrent, they have loops which process each part of the sequence at different time steps.

Advantage of Recurrent Neural Networks

  1. Recurrent Neural Networks are different from traditional ANNs because they don’t need a fixed input/output size and they also use previous data to make predictions.sequence.
  2. RNNs and they are super good at making use of data from a while ago (inherent in its nature) by using cell states.

Disadvantage of using Recurrent Neural Networks

only a fixed number of previous words can be taken into account to predict the next word. This limitation is inherent to the structure of FNNs, since they lack any form of ‘memory’: only the words that are presented via the fixed number of input neurons can be used to predict the next word and all words that were presented during earlier iterations are ‘forgotten’, although these words can be essential to determine the context and thus to determine a suitable next word.

Results

The most commonly used measures to evaluate language models are perplexity (PPL) The PPL measure is the inverse of the geometric average probability assigned by the model to each word in a test data set, given some sequence of previous words. The popularity of this measure for evaluating language models is due to the fact that it allows an easy comparison between different models.

Conclusion

 while considering two model Feedford neural network and recurrent neural network, It shows that Recurrent Neural network outperform for language translation.

Future Scope:

Implementation on different Marathi datasets are considered, paragraph to paragraph conversion has been considered

References

  1. https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571
  2. Shashi Pal Singh, Ajai Kumar, Hemant Darbari, Lenali Singh, Anshika Rastogi . Shikha Jain, “Machine translation using deep learning: An overview “
  3. M. Sundermeyer, I. Oparin, J.-L. Gauvain, B. Freiberg1, R. Schl ̈uter, H. Ney, “MACHINE TRANSLATION USING DEEP LEARNING: AN OVERVIEW” COMPARISON OF FEEDFORWARD AND RECURRENTNEURAL NETWORK LANGUAGE MODELS,” 978-1-4799-0356-6/13/$31.00 ©2013 IEEE
  4. Li Chen, Song Wang, Wei Fan, Jun Sun, Satoshi Naoi Fujitsu Research & Development Center, Beijing, China, “Cascading Training for Relaxation CNN on Handwritten Character Recognition,” 2167-6445/16 $31.00 © 2016 IEEE ,DOI 10.1109/ICFHR.2016.38
  5. Xie Chen, Member, IEEE, Xunying Liu, Member, IEEE, Yongqiang Wang, Member, IEEE,Mark J. F. Gales, Fellow, IEEE, and Philip C. Woodland, Fellow, IEEE “Efficient Training and Evaluation of Recurrent NeuralNetwork Language Models for AutomaticSpeech Recognition”, 2146 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER,2329-9290 © 2016 IEEE.
  6. Y. Bengio, R. Ducharme, P. Vincent,”A neural probabilistic language model J. Mach. Learn. Res., 3 (2003), pp. 1137-1155.
  7. P.A. Heeman, “POS tags and decision trees for language modelling” Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999), pp. 129-137.
  8. https://towardsdatascience.com/word-level-english-to-marathi-neural-machine-translation-using-seq2seq-encoder-decoder-lstm-model-1a913f2dc4a7
  9. https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/
  10. Sainik Kumar Mahata ,Dipankar Das “MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation” Article in Journal of Intelligent Systems · May 2018, DOI: 10.1515/jisys-2018-0016

Related posts

Hardware Implemantation of Zebra Crossing Detection and E-Challan System

admin

Automated Water Vending Machine using Arduino Uno for Hot, Cold, and Normal Water

admin

Real Time Analytics and Unified Information Management in Big data and Analytics

admin

Leave a Comment