IAVA - Interactive Animated Virtual Assistant
issue 1

IAVA – Interactive Animated Virtual Assistant

Aniket Rode, Aditya Dudhakawar, Aniketa Maharana, Prof. V. R. Surjuse

Abstract –

In today’s world, Artificial Intelligence (AI) has become an integral part of human life. There are many applications of AI such as Chatbot, network security, complex problem solving, assistants and many such. Artificial Intelligence designed to have cognitive intelligence which learns from its experience to take future decisions. A virtual assistant is also an example of cognitive intelligence. We all have comes across virtual assistant in some point of our life, it is fascinating that on our command we get all the information we want. Virtual assistant is designed to do the entertainment purpose, helping one to complete vast variety of task, getting control over smart devices and navigating through the city jungle and many more. Virtual assistant implies the AI operated program which can assist you to reply to your query or virtually do something for you. Currently, the virtual assistant is used for personal and professional use. Most of the virtual assistant is device-dependent and is bind to a single user or device. It recognizes only one user. Our project proposes an Assistant that is not a device bind. It can recognize the user using facial recognition. It can be operated from any platform. It should recognize and interact with the user. Moreover, virtual assistants can be used in different areas of applications such as education assistance, medical assistance, vehicles and robotics, home automation, and security access control.

Index Terms– Artificial Intelligence, Cognitive Intelligence, Virtual Assistant, Facial Recognition, Chatbot


AI defines as any device that understands its surroundings and takes actions that increase its chance to accomplish its goals. AI as “a system’s ability to precisely interpret external data, to learn from such data, and to use those learnings to accomplish distinct goals and tasks through supple adaptation.” Artificial Intelligence is the developing branch of computer science. Having much more power and ability to develop the various application. AI implies the use of a different algorithm to solve various problems. The major application being Optical character recognition, Handwriting Recognition, Speech Recognition, Video Manipulation, Robotics, Medical Implementation, Virtual Assistant, etc.

Considering all the applications, Virtual assistant is one of the most influencing applications of AI and lately attracting the interest and curiosity of researchers. The virtual assistant supports a wide range of application and because of this, it is categorized into many types such as virtual personal assistant, smart assistants, digital assistant, mobile assistant or voice assistant. Some of the well-known virtual assistants being ALEXA by Amazon, Cortana by Microsoft, Google Assistant by Google, Siri by Apple, Messenger ‘M’ by Facebook. These companies used different ways to design and improve their assistants. There are many techniques used to design the assistants based on the application and its complexity.

For example, Google uses the Deep Neural Network (DNN) for its components. Also, Microsoft uses the Microsoft Azure Machine Learning Studio to develop Cortana’s components. However, their potential is limited by some scathing security issues that they don’t support powerful authentication mechanisms and they are bind to their specific hardware. Face recognition or other identification mechanisms should be required before accepting any voice commands and they should not bind to any specific hardware.

Chatbots are the computer program which are designed to mimic the human conversational skills to simulate interaction with other human overs over the internet. It is kind of assistant with limited set of power, what if this small set of power can turn into large set of functionality with help of artificial intelligence. This artificially advanced chatbot is called virtual assistant. Chatbots can be designed to do variety of task from simple to the most complex with help of artificial intelligence. Now a days chatbots are implemented in business to automate the task which need human interaction and this all can be done simultaneously and with the great speed. This helps in increasing the productivity, response rate and customer service.

Facial recognition system is a technology which is capable to identify the person from a digital image or video source. There are many ways a facial recognition works but the simplest way is that it compares the facial features of the person from the digital input given by the camera with the given digital image in the database to give the most related and correct person from the database entry. Facial recognition system is deployed on many project such as attendance system, crime investigation department portfolios, face unlock in our phones and many other such things but never been used in any of the virtual assistant. Using this technology, we want create such virtual assistant which not only activate by voice commands but with our face also so it can distinguish between different user of the assistant which can lead to overcome so many security issues.

User interface play a very important role to make the project very interesting and successful over the market. We can take many example in which a good user interface lead to the successful product such as Instagram, Facebook. Simpler the user interface more curious is the customer to use the product. In this virtual assistant we tried our best to give the best user interface to the user so can use it very efficiently and easily. We tried to incorporate the interactive animation into the assistant so that user can find himself near to the cutting edge technology and next generation of the virtual assistant. In this user interface we tried to incorporate text-to-speech and speech-to-text functionality so that it can easier to use. In user interface we create a such system which can give experience about the near future technology where it gives the feeling that the artificial intelligence can be closer the human world. We created this user interface with help of some web development.

In this paper, we propose an approach that will overcome the security issue with the help of Face and Speech recognition and using browser-based assistant will overcome the hardware dedicated problem.


Virtual Assistants is one of the active areas that many companies are trying their hands on to improve its efficiency and applications. There are many techniques are used to design the virtual assistants based on its application and complexity and there are many different architectures for it. Based on this data we designed a data flow diagram for an assistant.

Fig-1: Data-Flow Diagram

The data flow diagram of the Interactive Animated Virtual Assistant is shown in Fig.1.1. It shows the flow of data from the user to the AI and the generation of the reply.


This assistant is fully modular and has a set of services. Each service offers some tasks to do which then combines its data to give a fully functional virtual assistant. Following is a brief idea about how the virtual assistant is going to function. It starts with the first step of facial recognition. If the user is detected it transfers to the next step else the prompt is provided as “User not detected want to register as new user” and new user registration prompt is opened and the predefined quaternary is loaded and the user is asked to answer the following questions for the registration process. Once all the questions are answered the facial sample photo is collected and the user is registered successfully and the application starts from the beginning.

Once the user is detected the application is connected to the database having the data of the particular user and the assistant is ready for the query. The user can start the conversation ask a question or do as the user wish. The speech recognition program converts the speech of the user into the text format and saves that information into the user database as the future data for speech recognition. The generated text is then transferred to the Chatbot application or can be called as dialogue manager. Then the proper reply is generated using the knowledge database. Once the reply is generated the text is then converted into speech and the output is produced through speakers.

IAVA is mainly divided into three services that handle the most of data. The following is the services we proposed in the IAVA:

A. Face Detection Service

The Face Detection Service allows the virtual assistant to detect the presence of the user in front of the device and verify its user data using the face in the image and database. Face Detection Service continuously scans the video input from the camera or webcam. As soon as the face is detected virtual is available for further query. Face Detection Service uses the Deep Learning method to detect the face and authorize the user.

B. Speech Detection Service

The Speech Detection Module allows the virtual assistant to record the user’s voice data using the microphone which then stores into the user database for speech recognition. It also has the functionality of speech synthesis which converts the text on the screen to the audio.

C. Dialogue Manager Service

The Dialogue Manager is the soul of a virtual assistant as it generates the query reply using its knowledge database. It has the functionality to give the most effective and best reply to the query asked by the user. The user input is mostly textual or vocal which the processed using the service which is used in the Dialogue Manager. Dialogue Manager is the key service that has the most complex task to do and give an accurate reply to the query.

D. Database

In this Virtual Assistant we divided the database into two part which is as follows:

a. User Database

The user database has all the information about the user which its image and vocal voice. It serves for user authentication and user insertion.

b. Knowledge Database

The knowledge database can be local as well as online which includes the facts about the user and its queries and reply’s database which gives the idea about how user and reply generation.

Fig-2: Flowchart


In the paper [1], the authors have explained how virtual private assistant work and how they are being upgraded using various new technology. It is the multimodal dialogue system. VPAs system has used speech, graphics, video, gestures and other modes for communication in both the input and output channel. Also, the VPAs system will be used to increase the interaction between users and computers by using some technologies such as gesture recognition, image/video recognition, speech recognition, and the Knowledge Base. Moreover, this system can enable a lengthy conversation with users by using the vast dialogue knowledge base. Our project emphasizes the VPA being device-independent which can be accessed whenever and wherever wanted.

In the paper [2], the authors have explained the AR-based Assistant which combines the human interface and location-aware digital system. It gives a much rich experience to the user. In this project, they are closer to create the virtual personal manager which gives the idea about it surrounding and location using augmented reality.

In the paper [4], the authors have explained smart assistants and smart home automation which gives the idea about speech-enabled virtual assistants which they find less secure so using a different technique they tried to overcome that issues. 


We evaluated the system in a controlled environment and different tasks as per the modules. I.A.V.A. is an ongoing project. Many changes are being made and being tested each time. Currently IAVA consist of three modules being

  1. Face Detection Module:  In this module using various background and light in the testing environment, it has been tested thoroughly and provided a satisfactory result of detecting and recognizing the face up to 80% time correctly. Optimization is being made to the module as the project goes further.
  2. New User Registration: Its a second module and it consists of adding a new user picturer which can be added using the webcam and it has also been tested and its works 100%.
  3. Speech to Text: Its the third module and it too works with 100% proper results.
  4. Reply Generation: It is done using AI. It is in progress and the Chatbot is in learning state and it can produce an accurate reply up to 70%.
  5. Text to speech: Its the final module it converts the Written reply from the chatbot to text to be delivered to the user and it works with a 100% success rate.
  6. WEB Site or CGI module: This module is the user interface or front-end of the project.   


Our paper introduced IAVA – our Omni accessible virtual personal assistant which can be accessed from any device and can be used by any registered user. We propose to utilize various AI techniques to achieve so such as face detection, speech recognition, Chatbot application, text to speech translation. All this while providing an interactive animation.

Based on our data we can find that this type of project can be very popular in users since it can be accessed from any device and it can be used in the future project. This type of Project can be used for medical purposes, business purposes, and many other applications.


Following technology further can be upgraded with new budding techs such as emotion detection and live face interaction. The interactive animation can be upgraded to the facial animation for a more human-like feel.


We would like to show our gratitude to the professors for sharing their pearls of wisdom with us during this research, and we thank them for insights and for their comments that greatly improved the report. We are also immensely grateful to our department for providing all the necessary equipment and facilities.


  1. Nextgeneration of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa, and Google Home) by Veton Këpuska; Gamal Bohouta 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC)
  2. MARA – A Mobile Augmented Reality-Based Virtual Assistant
  3. G. Bohouta, V. Z. Këpuska, “Comparing Speech Recognition Systems (Microsoft API Google API And CMU Sphinx)”, Int. Journal of Engineering Research and Application 2017, 2017.
  4. A vision and speech-enabled, customizable, virtual assistant for smart environments
  5. S. Arora, K. Batra, and S. Singh. Dialogue System: A Brief Review. Punjab Technical University.
  6. X. Lei, G. Tu, A. X. Liu, C. Li, and T. Xie, “The insecurity of home digital voice assistants – Amazon Alexa as a case study,” CoRR, vol. abs/1712.03327, 2017. K. Wagner.
  7. Facebook’s Virtual Assistant ‘M’ Is Super Smart. It’s Also Probably a Human. https://www.recode.com. B. Marr.
  8. The Amazing Ways Google Uses Deep Learning AI. https://www.forbes.com.
  9. Y. Nung, A. Celikyilmaz.Deep Learning for Dialogue Systems. Deep Dialogue.
  10. B. Martinez and M. F. Valstar, Advances, Challenges, and Opportunities in Automatic Facial Expression Recognition, pp. 63–100. Cham: Springer International Publishing, 2016.

Related posts

Simulation On Web Controlled Smart Notice Board Using Raspberry pi


Web Based Geographic Information System Development for Land Surface Temperature Analysis Using Python Framework


Handwritten Character Recognition Using Deep Learning


Leave a Comment