Apple-voice-recognition

Machine Learning

How does Siri work?

Siri is based on large-scale Machine Learning systems that employ many aspects of data science.

Upon receiving your request, Siri records the frequencies and sound
waves from your voice and translates them into a code. Siri then
breaks down the code to identify particular patterns, phrases, and
keywords. This data gets input into an algorithm that sifts through
thousands of combinations of sentences to determine what the
inputted phrase means. This algorithm is complex enough that it is
capable of working around idioms, homophones and other literary
expressions to determine the context of a sentence.

Once Siri determines its request, it begins to assess what tasks
needs to be carried out, determining whether or not the information
needed can be accessed from within the phone’s data banks or from
online servers. Siri is then able to craft complete and cohesive
sentences relevant to the type of question or command requested.

Technology behind Voice Identification

Voice identification technology captures and measures the physical
qualities of a person’s voice when speaking as well as the unique
biological parameters that combine to produce that voice.

These parameters Include:

#1 Pitch

Pitch is an important perceptual dimension by which listeners
discriminate and categorize voice quality. It affects the perceived
brightness of the sound, and brightness may be one of several
perceptual features of a sound used by listeners to distinguish one
voice quality from another.

#2 Intensity

The increased vocal intensity results from a greater
resistance by the vocal folds to increased airflow. The vocal folds are
blown wider apart, releasing a larger puff of air that sets up a sound
pressure wave of greater amplitude.

#3 Dynamics

Within-person variability in our vocal signals is
substantial: we volitionally modulate our voices to express our
thoughts and intentions or adjust our vocal outputs to suit a
particular audience, speaking environment, or situation.

Prerequisites

On the Terminal run – pip install speaker-verification-toolkit

On the Terminal run – pip install numba==0.48

In case an ERROR occurs while installing numba==0.48 then :

On the Terminal run – pip install librosa –ignore-installed llvmlite

Extra

> Numba is an upgraded version of Numpy.

> Librosa is a python package for music and audio analysis.

> svt.rms_silence_filter() used for filtering environment noise.

> Mel-Frequency Cepstral Coefficients (MFCC) feature extraction
method is a leading approach for speech feature extraction and
current research aims to identify performance enhancements.

> Known_1, Known_2, Unknown are sample audio voices.

> Covert audio from .mp4 to .wav beacuse librosa supports .wav.