Dr.Soni Chaturvedi ; Ms.Samiksha Khandekar
1 Associate Professor & Head ECE,P.I.E.T. Nagpur.
2 M.tech Scholar ECE,P.I.E.T. Nagpur.
Gender and gesture recognition has been a topic of research for many researchers for more than a decade. Researchers have worked on gender recognition and gesture recognition as two separate entities, and have achieved optimum results in both the domains. But, very few have worked on improving the quality of gender and gesture detection together in videos. In this paper, we propose a framework for improving accuracy of gender and gesture recognition, which can be used in visual surveillance systems as a tool to measure the irregularities in user behaviour of a particular gender. We have used daubichies based wavelets & spiking neural network (SNN) for gesture recognition and viola-jones cascade object detection combined with support vector machine (SVM) and facial geometry features in order to improve the quality of gender detection from faces. Our results when compared with k-Nearest Neighbour and Hidden Markov Models (HMMs) provides a 10% improvement in accuracy of gender recognition and 15% improvement of accuracy in gesture recognition from standard and real time videos. We plan to further enhance this work using machine learning in order to reduce the delay of processing for the system.
Keywords: Gender, Gesture, SVM, SNN, Wavelet, facial geometry
Gender recognition from images basically means to categorize the user’s input image into male, female or other based on certain distinctive facial features. These features can be evaluated on the basis of color, edge, geometry of the face or any other distinguishable set of features. The color based gender recognition is the most inaccurate, while the facial geometry based recognition is the most accurate, due to the fact that the color features only show color variation in the faces, which can vary for all gender types, while the geometry based features change for genders linearly, for example in most of the cases the eye to eye distance in males is more than that of females for a given dataset. Observations like these help in evaluating the gender from the given facial features.
Gesture recognition involves evaluating the actions performed by the user under test, and then concluding from those actions as to what the user is doing. This is a very vast field, and has been researched for more than a decade now. Researchers conclude that the window based technique which uses differential feature comparison is the most effective while evaluating gestures from a given set of frames for a given user under test.
In recent years the popularity of the gender from face pictures has attracted interest in each elementary and applied analysis. From the elemental purpose of read it’s terribly intriguing to know however for personalities gender recognition is a simple operation that is finished terribly apace, except for a laptop vision algorithmic rule the task may be terribly difficult. The difficulties emerge from the doable variations of a face captured by a camera , that rely upon the image acquisition method (pose of the face, image illumination and distinction, background), the intrinsic variations between people’s faces (expression, age, race), in addition because the occlusions (sunglasses, scarves, hats). From the applied analysis purpose of read, there’s an advert interest to own systems that may mechanically acknowledge the gender from face pictures. Examples embrace police work systems that may assist to limit areas to 1 gender solely quicker process in statistics systems that believe face recognition, custom user interfaces reckoning on the gender of the person interacting with them, good billboards designed to draw in the eye of male or feminine audience, and systems for the gathering of knowledge in support of marketing research. From these pictures, one might observe variations within the intensity distribution particularly within the hair and eyes regions. supported the observation of facial pictures from normal datasets, several researchers use the pel intensity values of the faces to coach a binary classifier for gender recognition [3, 4, 5]. more variations are often ascertained in terms of texture. this might ensue to the softer countenance of ladies and additional pronounced eyebrows, whereas men have a rougher skin particularly within the presence of beard. the foremost widespread texture descriptors for face pictures square measure the histograms of native binary patterns (LBP) [6, 7, 8]. One may additionally observe a variation within the form of the face. The face of a girl is usually additional rounded, whereas the face of a person is additional elliptical. In , the authors exploited this facet and planned the utilization of bar chart of gradients (HOG) descriptor  for the popularity of gender. In different works, shape-based options are combined with different forms of options so as to own a additional sturdy classifier [11, 12, 13, 14]. Finally, there also are several delicate variations within the pure mathematics of the faces. the common face of a person has nearer eyes, a agent nose and a narrower mouth. These observations triggered the investigation of what square measure referred to as facial fiducial distances, that square measure primarily the distances between bound facial landmarks (e.g. nose, eyes contour, eyebrows) . The fiducial points is also detected fix active form model  or deep learning techniques [17, 18].
Gesture-driven human laptop interaction (HCI) has been an energetic topic in recent years. The key element of gesture-driven HCI could be a gesture recognition system, that identifies the gestures to produce input to the choice system of HCI. several ways are developed for each hand gesture and full body gesture recognition . These recognition tasks square measure difficult as a result of each human hand and body square measure extremely articulated. so as to develop associate degree economical gesture recognition system, a vital issue is to extract options to explain the extremely articulated body. in concert resolution, marker positions extracted fix infraredreflective marker systems square measure used in several interactive environments, love in . Marker-based systems will dependably capture the 3D coordinates of the markers placed on bony landmarks of the body. However, carrying markers is cumbersome to the topic fix the system. in addition, business marker-based motion capture systems square measure terribly overpriced. Therefore, a video-based gesture recognition system is most well-liked for non-intrusive and low price sensing. In existing video-based gesture recognition systems, trailing bound “landmarks” within the motion pictures could be a wide applied strategy to extract motion options. In , human hands and head square measure caterpillar-tracked fix skin detector and face detector. In  and , some “visual attention-grabbing points” or “visual cues” square measure accustomed describe the motion in every image frame. Landmark-based feature extraction is vulnerable to trailing failure in several cases, particularly once the angle of view of the camera changes, or equivalently the body orientation (facing direction) of the topic changes. In such cases, several landmarks may be occluded. The system delineate in  is landmark-free, however it’s a read dependent system. There square measure some systems that perform view-invariant gesture recognition, love in  and , however these systems principally believe getting 3D data of the topic fix additional sophisticated camera systems.
In our technique, we propose a facial geometry based gender recognition system, and combine it with window & wavelet based gesture recognition system in order to combine both the fields and create a sufficiently accurate system, which can work for both standard and real time video datasets. The next section describes the various techniques used in gender and recognition, followed by our proposed approach for performing the task, and it’s result evaluation. We conclude the paper with some good observations and a way forward for researchers in order to take this work further.
2. Literature review
In this section, we are going to be reviewing the various techniques for gender and gesture recognition. Golomb et al.  utilized a neural network, denoted as GenderNet, to classify gender from ninety face pictures sampled at thirty × thirty pixels. Gutta et al.  utilized radial basis perform and call trees for gender classification, wherever normalized pictures of sixty four × seventy two pixels, manually metameric, from FERET dataset  were utilized in their experiments. Moghaddam et al.  planned a gender recognition technique from twenty one × twenty one pixels from FERET dataset victimization SVM classifier. Nakano et al.  explored edge info to classify gender victimization neural networks. Yang et al.  planned a texture normalisation technique for up gender classification. Baluja ANd Rowley  developed an Adaboost classifier for distinctive gender on pictures from FERET dataset with resolution twenty × twenty pixels. Li et al. ,  planned a gender recognition technique supported motion extracted from human silhouettes metameric from variety of action elements. Alexandre  planned a fusion theme supported form and texture options from multiple resolutions of normalized pictures (resized to 20×20, 36×36 and 128×128 pixels). Bekios-Calfa et al.  mentioned the utilization of linear techniques, cherish Linear Discriminant Analysis , for gender recognition. Perez et al.  explored intensity, texture and form options combined with mutual info. Results were evaluated on the FERET dataset. Dago-Casas et al.  conducted experiments with cross-database benchmarks for gender classification beneath at liberty conditions. Shan  developed a gender recognition approach based mostly a mixture of texture options victimization native binary patterns (LBP) with AN AdaBoost classifier. Experiments were conducted on the tagged Faces within the Wild (LFW) benchmark . Toews and Arbel  delineate a framework for police work, nativeizing and classifying visual traits of objects from a viewpoint-invariant model derived from local scale-invariant options. A Bayesian classifier was utilized to spot the visual traits. FERET dataset was accustomed judge the gender classification technique. Mansanet et al.  planned a neighborhood deep neural network, denoted as local-DNN, that integrates native options and deep architectures. Experiments were conducted on the tagged Faces within the Wild (LFW)  and also the Gallagher’s dataset . Fellous  investigated the utilization of twenty four normalized horizontal and vertical distances calculated from fiducial points extracted from a collection of 109 pictures. The model was trained on FERET dataset and different pictures nonheritable in his laboratory so as to predict the gender of varied facial expressions. Gupta  given AN approach to police work gender of individuals through frontal facial pictures supported data processing and Delaunay triangulation techniques. numerous classifiers, cherish useful trees, random forests, natıve Thomas Bayes, AdaBoost and J48, were accustomed acknowledge a gender as male or feminine on the FEI Face dataset . Patel et al.  developed a facial gender recognition technique supported a compass native binary pattern descriptor, that was evaluated on CUFS and CUFSF datasets. Saint Matthew ANd Hassner  developed an automatic gender classification employing a convolutional neural network (CNN) to find out representations from the info and improve the performance of the classification task. Experimental results were evaluated on the urge benchmark.
The recognition of gesture involves many ideas cherish pattern recognition , motion detection and analysis , and machine learning . completely different tools and techniques square measure used in gesture recognition systems, cherish laptop vision , image process , pattern recognition , applied math modeling . the utilization of neural networks for gesture recognition has been examined by several researchers. Most of the researches use ANN as a classifier in gesture recognition method, whereas some others use it to extract the form of the hand, as in . Tin H.  presents a system for hand chase and gesture recognition victimization NNs to acknowledge Union of Burma Alphabet Language (MAL). Adobe Photoshop filter is applied to seek out the sides of the input image and bar chart of native orientation utilized to extract image feature vector which might be the input to the supervised neural networks system. Manar M.  used 2 repeated neural network architectures to acknowledge Arabic signing (ArSL). Elman (partially) repeated neural networks and absolutely repeated neural networks are used individually. a coloured glove used for input image knowledge, and for segmentation method, HSI color model is applied. The segmentation divides the image into six color layers, one for the articulatio radiocarpea and 5 for fingertips. thirty options square measure extracted and sorted to represent one image, fifteen components accustomed represent the angles between the fingertips and between them and also the articulation of data , and fifteen components to represent distances between fingertips; and between fingertips and also the articulation of data . This input feature vector is that the input to each neural networks systems. 900 coloured pictures were used as coaching set, and three hundred coloured pictures for system testing. Results had shown that absolutely repeated neural network system (with recognition rate ninety five.11%) higher than the Elman neural network (with eighty nine.67% recognition rate). Kouichi M. in  given Japanese signing recognition victimization 2 completely different neural network systems. Firstly, back propagation algorithmic program was used for learning postures of Japanese alphabet. For input postures, knowledge glove is employed, and normalisation operation was applied as a preprocessing tread on the input image. The feature extracted from input pictures was thirteen knowledge things, 10 for bending, and 3 for angles within the coordinates. The output of the network was forty two characters. The network consists of 3 layers, the input layer with thirteen nodes, the hidden layer with a hundred nodes, and also the output layer with forty two nodes that corresponds forty two recognized characters. the popularity rate for learning forty two educated patterns was seventy one.4%, and for unregistered folks forty seven.8%, whereas the speed improved once extra patterns accessorial to the system, it became ninety eight.0% for registered, and 77.0% for unregistered folks. Elman repeated Neural Network was the second system applied for recognition gestures. The system might acknowledge ten words. the information item are taken from data glove and also the same preprocessing applied for input image. options extracted square measure sixteen knowledge things, ten for bending, three for angles within the coordinates, and three for angles within the coordinates. The network consists of 3 layers, the input layer with sixteen nodes, the hidden layer with one hundred fifty nodes, and also the output layer with ten nodes that corresponds ten recognized words. Some improvement within the point knowledge and filtering knowledge house square measure accessorial to the system . Integration of those 2 neural networks, in a way, that once receiving knowledge from knowledge glove, a determination of the beginning sampling time and if the info item thought-about a gesture or a posture is shipped to successive network, for checking the sampling knowledge and also the system hold a history, that decide the top of signing. the ultimate recognition rate with the secret writing ways was ninety six. Stergiopoulou E.  recognized static hand gestures victimization Self-Growing and Self-Organized Neural Gas (SGONG) network. A camera used for effort the input image, and YCbCr color house is applied to discover hand region, some thresholding technique accustomed discover colouring. SGONG network use competitive Hebbian learning algorithmic program for learning method, the training begin with solely 2 neurons and continuous growing until a grid of neurons square measure made and canopy the hand object which is able to capture the form of the hand. From the resultant hand form 3 geometric options was extracted, 2 angles supported hand slope and also the distance from the palm center determined, wherever these options accustomed verify the amount of the raised fingers. For recognizing tip, statistical distribution model employed by categoryifying the fingers into 5 categories and calculate the options for every class. The system recognized thirty one predefined gestures with recognition rate ninety.45%, in time interval one.5 second. Shweta K. in  introduced gesture recognition system victimization Neural Networks. Web-cam used for capturing input image at slow rate samples between 15-25 frames per second. Some preprocessing are created on the input image that convert the input image into sequence of (x, y) coordinates victimization MATLAB, then passed into neural classifier, within which it’ll classify the gesture into one in all many classed predefined categories which might be known by the system. poet and Geoffiey , used neural networks to map hand gestures to speech synthesizer victimization Glove-Talk system that translated gestures to speech through adjustive interface that is a very important category of neural networks applications .
Many researches are applied based mostly the bar chart, wherever the orientation bar chart is employed as a feature vector . the primary implementation of the orientation bar chart in gesture recognition system and real time was done by William F. and Michal R. ; they given a way for recognizing gestures supported pattern recognition victimization orientation bar chart. For digitized input image, black and white input video was used, some transformations were created on the image to calculate the bar chart of native orientation of every image, then a filter applied to blur the bar chart, and plot it in polar coordinates. The system consists of 2 sections; coaching phase, and running section. within the coaching section, for various input gestures the coaching set is keep with their histograms. In running section AN input image is given to the pc and also the feature vector for the new image is created, Then comparison performed between the feature vector of the input image with the feature vector (oriented histogram) of all pictures of the coaching section, victimization euclidian distance metric and also the less error between the 2 compared histograms are elect. the full method time was a hundred millisecond per frame. Hanning Z., et al.  given hand gesture recognition system supported native orientation bar chart feature distribution model. colouring based mostly segmentation algorithmic program were accustomed notice a mask for the hand region, wherever the input RGB image born-again into HSI color house, so map the HSI image H to a chance quantitative relation image L the hand region is metameric by thresholding price, 128 components within the native orientation bar chart feature were used. The increased of the native orientation bar chart feature vector enforced by adding the image coordinates of the sub-window. To compact options illustration, k-means clump has been applied over the increased native orientation bar chart vectors. In Recognition stage, euclidian distance accustomed calculates the precise matching score between the input image and keep posture. Then neighbourhood Sensitive Hashing (LSH) accustomed notice the approximate nearest neighbors, and cut back process value for image retrieval. Wysoski et al.  given a rotation invariant static-gesture recognition approach victimization boundary histograms. Colouring detection filter was used, followed by activity erosion, dilation as preprocessing operation, and clump method to seek out the teams within the image. for every cluster the boundary was extracted victimization a normal contour-tracking algorithmic program. The image Divided into grids, and normalized the boundary in size, that offer the system invariability distance between the camera and hand. homogeneous background was applied, and also the boundary is drawn as chord’s size chain. The image was divided into variety of regions N. and also the regions were divided in a very radial type , in step with a selected angle as shown within the Figure. The bar chart of boundary chord’s size was calculated. that the whole feature vector consists of a ordered chain of histograms. Multilayer perceptron (MLP) Neural Networks and Dynamic Programming (DP) matching were used as classifiers. twenty six static postures from yank signing, for each posture, forty photos were taken, twenty photos for coaching and twenty for check. completely different variety of histograms were used varies from eight to thirty six increasing by 2, with completely different bar chart resolutions.
Clustering algorithms could be a general term contains all ways that partitioning the given set of sample knowledge into subsets or clusters  supported some measures between sorted components . in step with this live the pattern that share identical characteristics square measure sorted along to create a cluster . Clump Algorithms are wide unfolded attributable to their ability of grouping sophisticated knowledge collections into frequently clusters . In fuzzy clump, the partitioning of sample knowledge into teams in a very fuzzy approach square measure the most distinction between fuzzy clump and different clump algorithmic program , wherever the one knowledge pattern would possibly belong to completely different knowledge teams . Xingyan L. In  given fuzzy c-means clump algorithmic program to acknowledge hand gestures in a very mobile remote. A camera was used for acquire input raw pictures, the input RGB pictures square measure born-again into HSV color model, and also the hand extracted once some preprocessing operations to get rid of noise and unwanted objects, and thresholding accustomed section the hand form. thirteen components were used as feature vector, initial one for ratio of the hand’s bounding box, and also the rest twelve parameters represent grid cell of the image, and every cell represents the mean grey level within the three by four blocks partition of the image, wherever the mean of every cell represents the common brightness of these pixels within the image, Then FCM algorithmic program used for classification gestures. numerous environments square measure utilized in the system cherish complicated background and invariant lighting conditions. half-dozen hand gestures used with twenty samples for every gesture within the vocabulary to form the coaching set, with recognition accuracy eighty five.83%.
Many researches were applied within the field of gesture recognition victimization HMM. HMM could be a model , with a finite variety of states of Markov chain, and variety of random performs in order that every state encompasses a random function . HMM system topology is drawn by one state for the initial state, a collection of output symbols , and a collection of transitions state . HMM contained loads of mathematical structures and has verified its potency for modeling spatio–temporal info knowledge . signing recognition, square measure one in all the foremost applications of HMM , and speech recognition . In  Keskiin C., et. al. given HCI interface supported real time hand chase and 3D gesture recognition victimization hidden mathematician models (HMM). 2 coloured cameras for 3D construction square measure used. to beat the matter of victimization colouring for hand detection attributable to hand overlapping with different body elements, markers square measure accustomed cut back the quality in hand detection method . Markers accustomed section the hand from complicated backgrounds beneath invariant lighting conditions. The markers square measure distinguished victimization marker detection utility, and connected elements algorithmic program was applied to seek out marker regions victimization double thresholding. For tip detection, straightforward descriptors were used, wherever the bounding box and 4 outermost points of the hand that shaping the box is decided . The bounding confine some cases has to be elongate to see the mode of the hand, and also the points accustomed predict the tip location in several modes of the hand. Kalman filter was used for filtering flight of the hand motion. For 3D reconstruction of finger coordinates, standardization utility was enforced for specific standardization object . Least sq. approach accustomed generate tip coordinates, and kalman filter applied for smoothing the flight of 3D reconstructed coordinates. To eliminate reference frame dependency, the 3D coordinates square measure born-again into sequences of quantal rate vectors. HMM interprets these sequences , that square measure directional characterizing of the motion . The system designed for game and painting programs application. Hand chase is used to imitate the movements of the mouse for drawing, and also the gesture recognition system used for choosing commands. Eight gestures are used for system coaching, and one hundred sixty for testing, with 98.75% recognition performance.
3. Proposed hybrid Gesture and Gender recognition system
The proposed technique can be represented using the following block diagram shown in figure 1,
From the block diagram, we can observe that the input video set is first collected for both gender and gestures. The gender video dataset has videos which have frontal facing images of persons, this video set has a collection of more than 120 people and is taken from the UCI learning repository. The gesture video dataset is taken from various sources, including but not limited to berkley video dataset, open surveillance video dataset and stanford video dataset. These datasets contains videos of users performing more than 50 different gestures over more than 100 persons. Such a wide variety of dataset is selected for proper evaluation of the system.
The frames from each of these videos is given input to the system. In the gender detection block, the viola jones algorithm is used to detect the face from the input video. The viola jones algorithm gives the location of various geometric points on the face, like eye locations, nose location, and mouth location. We then evaluated the following features from the facial points,
- Eye to eye distance
- Left & right eye to nose distance
- Left & right eye to mouth distance
- Left and right eye width and height
- Nose to mouth distance
- Nose width and height
- Mouth width and height
When combined together, these features form a Feature Vector (FV) of about 12 different facial geometry features. These features are saved into the database of Matlab along with the training classes of Male & Female for each of the videos. For every new video frame, the database is evaluated and the class is found for each of the video frame. For N frames, we get N class values, the highest probability of occurring value is selected as the gender for the given video.
The gesture recognition block is the lower part of the block diagram. In this part, we first select the human silhouette for which the gestures are to be evaluated. The selected silhouette is given to a daubechies level 8 filter bank (DB8), which produces approximate, horizontal, vertical and diagonal components of the selected object. These features are stored into a feature vector (FV). For each of the consecutive frames, these FVs are compared within the MxM window using a standard Spiking Neural Network (SNN) with 10 input neurons, 50 hidden neurons and 1 output neuron, and the most matching FV frame from the SNN is selected as the location of the selected silhouette, and it’s location is marked. At the end of the video, these marked locations are tracked in order to evaluate the speed and direction of the silhouette. The speed and direction is then given to a decision block in order to evaluate the gesture of the silhouette.
The evaluated gesture and gender is given to a combination block, which shows the combined output of the gender and gesture for the given video. The next section describes the results and analysis of our proposed approach and compares it with some existing approaches in order to provide a complete scenario of where our algorithm stands in terms of accuracy of detection.
4. Results and Analysis
We compared our algorithm with kNN and HMM. kNN is a linear classifier, while HMM is a non-linear classifier. The results of both the classifiers when compared with our proposed approach is as shown in the following tables,
From our analysis we can observe that the proposed system outperforms both kNN and HMM based systems, due the fact that SVM classifier is a strong 2 class classifier, and can strongly distinguish between frames with different geometrical features. Thus, SVM is the selected choice for any gender recognition system.
The results for gesture recognition which can be observed from the following table, also conclude that the proposed approach is about 15% improved as compared to kNN and HMM, which is due to the fact that the proposed system uses a complex set of DB8 features and uses a bigger search window of 16×16, which ensures that the object under tracking is never out of the frame, and thus the gesture tracking is done effectively.
We implemented the system in Matlab, the following screenshots demonstrate the output figures from the simulations.
These figures are some certain examples of the given videos, we tried our system on various other video sequences and found the results to be following the same results.
From the obtained results, we can conclude that the proposed system is more than 10% accurate than the kNN and HMM based gender detection systems, and it is more than 15% accurate in terms of gesture recognition accuracy when compared with similar techniques. Evaluation of the algorithms using other datasets also produce similar results, and thus can be used in real time systems.
6. Future work
The proposed protocol demonstrates good accuracy for both stored and real time videos. But it is observed that the system has moderate to high delay as the training set increases, thus it needs to be reduced by using machine learning optimizations, which can be carried out as a future work for this research.
- C. B. Ng, Y. H. Tay, and B. M. Goi. A review of facial gender recognition. Pattern Analysis and Applications, 18(4):739– 755, 2015.
- P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face-recognition algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(10):1090–1104, 2000.
- B. Moghaddam and M. Yang. Learning gender with support faces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5):707–711, 2002.
- S. Baluja and H. A. Rowley. Boosting sex identification performance. International Journal of computer vision, 71(1):111–119, 2007.
- J. Yang, D. Zhang, A. F. Frangi, and J. Y. Yang. Twodimensional pca: a new approach to appearance-based face representation and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(1):131–137, 2004.
- T. Ojala, M. Pietikainen, and T. M ¨ aenp ¨ a¨a. Multiresolution ¨ gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971–987, 2002.
- Z. Yang and H. Ai. Demographic classification with local binary patterns. In Advances in Biometrics, pages 464–473. Springer, 2007.
- C. Shan. Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4):431–437, 2012.
- V. Singh, V. Shokeen, and M. B. Singh. Comparison of feature extraction algorithms for gender classification from face images. In International Journal of Engineering Research and Technology, volume 2. ESRSA Publications, 2013.
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
- L. A. Alexandre. Gender recognition: A multiscale decision fusion approach. Pattern Recognition Letters, 31(11):1422– 1427, 2010.
- J. E. Tapia and C. A. Perez. Gender classification based on fusion of different spatial scale features selected by mutual information from histogram of lbp, intensity, and shape. Information Forensics and Security, IEEE Transactions on, 8(3):488–499, 2013.
- J. Bekios-Calfa, J. M. Buenaposada, and L. Baumela. Robust gender recognition by exploiting facial attributes dependencies. Pattern Recognition Letters, 36:228–234, 2014.
- G. Azzopardi, A. Greco, and M. Vento. Gender recognition from face images using a fusion of svm classifiers. In International Conference Image Analysis and Recognition, pages 533–538. Springer, 2016.
- R. Brunelli and T. Poggio. Face recognition: Features versus templates. IEEE Transactions on Pattern Analysis & Machine Intelligence, (10):1042–1052, 1993.
- S. Milborrow and F. Nicolls. Locating facial features with an extended active shape model. In Computer Vision–ECCV 2008, pages 504–513. Springer, 2008.
- Y. Sun, X. Wang, and X. Tang. Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3476–3483, 2013.
- E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 386–391, 2013.
- G. Azzopardi and N. Petkov. Trainable COSFIRE filters for keypoint detection and pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2):490–503, Feb 2013.
- G. Azzopardi, L. Fernandez Robles, E. Alegre, and N. Petkov. Increased generalization capability of trainable cosfire filters with application to machine vision. In 23rd International Conference on Pattern Recognition (ICPR), 2016, in print.
- L. Elden. Matrix Methods in Data Mining and Pattern Recognition. SIAM, Philadelphia, 2007.
- M. Holte and T. Moeslund. View invariant gesture recognition using 3d motion primitives. In Proc. ICASSP, pages 797 – 800, 2008.
- H. A. L. Kiers. An alternating least squares algorithms for parafac2 and three-way dedicom. Computational Statistics & Data Analysis, 16(1):103 – 118, 1993.
- T. Kirishima, K. Sato, and K. Chihara. Real-time gesture recognition by learning and selective control of visual interest points. Pattern Analysis and Machine Intelligence, 27(3):351–364, 2005.
- C.-S. Lee and A. Elgammal. Modeling view and posture manifolds for tracking. In Proc. ICCV, pages 1–8, 2007.
- S.-W. Lee. Automatic gesture recognition for intelligent human-robot interaction. In Proc. FGR, pages 645–650, 2006.
- S. Mitra and T. Acharya. Gesture recognition: A survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews, 37(3):311–324, 2007.
- H. S. Park, D. J. Jung, and H. J. Kim. Vision-based game interface using human gesture. In Advances in Image and Video Technology, pages 662–671. Springer, Berlin/Heidelberg, 2006.
- B. Peng and G. Qian. Binocular dance pose recognition and body orientation estimation via multilinear analysis. In Proc. CVPR Workshops, pages 1–8, 2008.
- G. Qian, F. Guo, T. Ingalls, L. Olson, J. James, and T. Rikakis. A gesture-driven multimodal interactive dance system. In Proc. ICME, pages 1579–1582, 2004.
- S. Rajko. Ame patterns library [computer software]. http://ame4.hc.asu.edu/amelia/patterns/, 2008.
- M. A. O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. In Proc. ECCV, pages 447–460, 2002.
- M. A. O. Vasilescu and D. Terzopoulos. Tensortextures: Multilinear image-based rendering. ACM Transactions on Graphics, 23(3):334–340, 2004.
- D. Vlasic, M. Brand, H. Pfister, and J. Popovi. Face transfer with multilinear models. In Proc. ACM SIGGRAPH, pages 426 – 433, 2005.
- D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding, 104(2-3):249– 257, 2006.
- G. Ye, J. J. Corso, D. Burschka, and G. D. Hager. Vics: A modular hci framework using spatiotemporal dynamics. Machine Vision and Applications, 16(1):13–20, 2004.