Python SDK

Sensory Cloud is built using python 3.8, but it is compatible with python 3.6 and higher

General Information

Before getting started, you must spin up a Sensory Cloud inference server or have Sensory spin one up for you. You must also have the following pieces of information:

  • Your inference server URL
  • Your Sensory Tenant ID (UUID)
  • Your configured secret key used to register OAuth clients

Checking Server Health

It’s important to check the health of your Sensory Inference server. You can do so by following the example here.

Secure Credential Store

ISecureCredential is an interface that store and serves your OAuth credentials (clientId and clientSecret).
ISecureCredential must be implemented by you and the credentials should be persisted in a secure manner, such as in an encrypted database.
OAuth credentials should be generated one time per unique machine.
A crude example of ISecureCredential can be seen here.

Registering OAuth Credentials

OAuth credentials should be registered once per unique machine. Registration is very simple, and provided as part of the SDK.
The source code for the OAuthService can be found in the oauth_service.py file and
the example file here shows how to create an OAuthService and register a client for the first time.

Creating a Token Manager

The TokenManger class handles requesting OAuth tokens when necessary.
The source code for the TokenManager can be found in the token_manager.py and
an example of its implementation can be seen here.

Creating an Audio Service

The code snippets shown in this section give a brief summary of the implementation of the AudioService class. Full examples can be found
here and the source code for the AudioService class can be found in the
audio_service.py file.

AudioService provides methods to stream audio to Sensory Cloud. It is recommended to only have 1 instance of AudioService
instantiated per Config. In most circumstances you will only ever have one Config, unless your app communicates with
multiple Sensory Cloud servers.

def get_audio_service() -> AudioService:
    is_connection_secure: bool = True
    fully_qualifiied_domain_name: str = os.environ.get("FULLY_QUALIFIED_DOMAIN_NAME")
    tenant_id: str = os.environ.get("TENANT_ID")
    client_id = os.environ.get("CLIENT_ID")
    client_secret = os.environ.get("CLIENT_SECRET")

    config: Config = Config(
        fully_qualifiied_domain_name=fully_qualifiied_domain_name,
        is_connection_secure=is_connection_secure,
        tenant_id=tenant_id,
    )
    config.connect()

    cred_store: SecureCredentialStore = SecureCredentialStore(client_id, client_secret)
    oauth_service: OauthService = OauthService(config=config, secure_credential_store=cred_store)

    token_manager: TokenManager = TokenManager(oauth_service=oauth_service)

    audio_service: AudioService = AudioService(config, token_manager)

    return audio_service

Obtaining Audio Models

Certain audio models are available to your application depending on the models that are configured for your instance of Sensory Cloud.
In order to determine which audio models are accessible to you, you can execute the below code.

audio_service: AudioService = get_audio_service()
audio_models: GetModelsResponse = audio_service.get_models()

Audio models contain the following properties:

  • name – the unique name tied to this model. Used when calling any other audio function.
  • isEnrollable – indicates if the model can be enrolled into. Models that are enrollable can be used in the CreateEnrollment function.
  • modelType – indicates the class of model and it’s general function.
  • fixedPhrase – for speech-based models only. Indicates if a specific phrase must be said.
  • samplerate – indicates the audio samplerate required by this model. Generally, the number will be 16000.
  • isLivenessSupported – indicates if this model supports liveness for enrollment and authentication. Liveness provides an added layer of security by requring a users to speak random digits.

Enrolling with Audio

In order to enroll with audio, you must first ensure you have an enrollable model enabled for your Sensory Cloud instance. This can be obtained via the GetModels() request.
Enrolling with audio uses an audio stream iterator that yields audio bytes. A sample audio stream iterator is shown below and passed into the audio enrollmnent class method.
It is important to save the enrollmentId in order to perform authentication against it in the future.

class AudioStreamIterator:
    """
    This is a sample audio stream iterator that uses the pyaudio package to interface with
    the microphone and can be used with all of the methods in the AudioService class except for
    get_models().  This implementation of an audio stream iterator is just one option, but the user
    has the freedom to choose whatever implementation they would like, so long as it is an iterator that yields
    audio bytes.
    """

    _p_output, _p_input = multiprocessing.Pipe()

    def __init__(
        self,
        channels: int,
        rate: int,
        frames_per_buffer: int,
        format: int = pyaudio.paInt16,
    ):
        self.channels = channels
        self.rate = rate
        self.frames_per_buffer = frames_per_buffer
        self._py_audio = pyaudio.PyAudio()
        self._stream = self._py_audio.open(
            format=format,
            channels=channels,
            rate=rate,
            input=True,
            frames_per_buffer=frames_per_buffer,
            stream_callback=self._record_callback,
        )

    def _record_callback(self, in_data, count, time_info, status):
        self._p_input.send(in_data)
        return (None, pyaudio.paContinue)

    def __iter__(self):
        return self

    def __next__(self):
        return self._p_output.recv()

    def close(self):
        self._stream.stop_stream()
        self._stream.close()
        self._py_audio.terminate()

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

enrollment_stream = audio_service.stream_enrollment(
    audio_config=audio_config,
    description=enrollment_description,
    user_id=user_id,
    device_id=device_id,
    model_name="my-audio-enrollment-model",
    is_liveness_enabled=is_liveness_enabled,
    audio_stream_iterator=audio_stream_iterator,
)

enrollment_id = None
try:
    print(
        "Recording enrollment (repeat saying the model enrollment utterance until the enrollment is complete)..."
    )
    percent_complete = 0
    print(f"percent complete = {percent_complete}")
    for response in enrollment_stream:
        if response.percentComplete != percent_complete:
            percent_complete = response.percentComplete
            print(f"percent complete = {percent_complete}")
        if response.percentComplete >= 100:
            break
    enrollment_id = response.enrollmentId
    print("Enrollment complete!")
    print(f"Enrollment Id = {enrollment_id}")
except Exception as e:
    print(f"Enrollment failed with error: {str(e)}")
finally:
    audio_stream_iterator.close()
    enrollment_stream.cancel()

Authenticating with Audio

Authenticating with audio is similar to enrollment, except now you have an enrollment_id to pass into the function. The AudioStreamIterator
class shown in the enrollment example above will be used again here.

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

authenticate_stream = audio_service.stream_authenticate(
    audio_config=audio_config,
    enrollment_id="my-enrollment-id",
    is_liveness_enabled=is_liveness_enabled,
    audio_stream_iterator=audio_stream_iterator,
)

authentication_success = False
try:
    print("Authenticating...")
    for response in authenticate_stream:
        if response.success:
            authentication_success = True
            break
    print("Authentication successful!\n")
except Exception as e:
    print(f"Authentication failed with error: {str(e)}\n")
finally:
    audio_stream_iterator.close()
    authenticate_stream.cancel()

Audio Events

Audio events are used to recognize specific words, phrases, or sounds.

The below example waits for a single event to be recognized and ends the stream.

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

event = None
try:
    print("Listening for events...")
    for response in event_stream:
        if response.success:
            print(response.resultId)
            event = response.resultId
    print(f"Detected {event}, ending session")
except Exception as e:
    print(f"Event detection failed with error: {str(e)}\n")
finally:
    audio_stream_iterator.close()
    event_stream.cancel()

Create Enrolled Event

You can enroll your own event into the Sensory cloud system. The process is similar to biometric enrollment where you must
play a sound or speak a particular phrase 4 or more times. This is usefull for recognizing sounds that are not offered by Sensory Cloud.

model_name = "sound-dependent-16kHz.ubm"
description = "enrolled-event-example"

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

enrollment_id = None
try:
    print("Enrolling...")
    percent_complete = 0
    print(f"percent complete = {percent_complete}")
    for response in enrolled_event_stream:
        if response.percentComplete != percent_complete:
            percent_complete = response.percentComplete
            print(f"percent complete = {percent_complete}")
        if response.percentComplete >= 100:
            break
        enrollment_id = response.enrollmentId
    print("Enrollment complete!")
    print(f"Enrollment Id = {enrollment_id}")
except Exception as e:
    print(f"Enrolled event failed with error: {str(e)}")
finally:
    audio_stream_iterator.close()
    enrolled_event_stream.cancel()

Validate Enrolled Event

Once you’ve created an enroled event, you can listen for that event (or groups of events) by calling
the ValidateEnrolledEvent function.

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

validate_enrolled_event_stream = audio_service.stream_validate_enrolled_event(
    audio_config=audio_config,
    enrollment_id=event_enrollment_id,
    audio_stream_iterator=audio_stream_iterator,
)

authentication_success = True
try:
    print("Authenticating enrolled event...")
    for response in validate_enrolled_event_stream:
        if response.success:
            authentication_success = True
            break
    print("Authentication successful!\n")
except Exception as e:
    print(f"Enrolled event validation failed with error: {str(e)}")
finally:
    audio_stream_iterator.close()
    validate_enrolled_event_stream.cancel()

Transcription

Transcription is used to convert audio into text.

transcription_model = "vad-lvscr-lights-2.snsr"

audio_service: AudioService = get_audio_service()

audio_config = AudioConfig(
    encoding=AudioConfig.AudioEncoding.Value("LINEAR16"),
    audioChannelCount=1,
    sampleRateHertz=16000,
    languageCode="en-US",
)

upload_interval = 100  # (ms)
frames_per_buffer = int(audio_config.sampleRateHertz * (upload_interval / 1000))

audio_stream_iterator = AudioStreamIterator(
    channels=audio_config.audioChannelCount,
    rate=audio_config.sampleRateHertz,
    frames_per_buffer=frames_per_buffer,
)

transcribe_stream: typing.Iterable[
    TranscribeResponse
] = audio_service.stream_transcription(
    audio_config=audio_config,
    user_id=user_id,
    model_name=transcription_model,
    audio_stream_iterator=audio_stream_iterator,
)

transcription = None
try:
    print("LVCSR lights session begin\n")
    for response in transcribe_stream:
        if not response.isPartialResult:
            print(response.transcript)
            transcription = response.transcript
    print("Complete transcription detected, ending session")
except Exception as e:
    print(f"Transcription failed with error: {str(e)}\n")
finally:
    audio_stream_iterator.close()
    transcribe_stream.cancel()

Creating a Video Service

VideoService provides methods to stream images to Sensory Cloud. It is recommended to only have 1 instance of VideoService
instantiated per Config. In most circumstances you will only ever have one Config, unless your app communicates with
multiple Sensory Cloud servers.

The snippets below give a brief summary of the implementation of the VideoService class. The source code can be seen in the
video_service.py and full examples of its implementation are given
here.

def get_video_service() -> VideoService:
    is_connection_secure = True
    fully_qualifiied_domain_name = os.environ.get("FULLY_QUALIFIED_DOMAIN_NAME")
    tenant_id = os.environ.get("TENANT_ID")
    client_id = os.environ.get("CLIENT_ID")
    client_secret = os.environ.get("CLIENT_SECRET")

    config = Config(
        fully_qualifiied_domain_name=fully_qualifiied_domain_name,
        is_connection_secure=is_connection_secure,
        tenant_id=tenant_id,
    )
    config.connect()

    cred_store = SecureCredentialStore(client_id, client_secret)
    oauth_service = OauthService(config=config, secure_credential_store=cred_store)

    token_manager = TokenManager(oauth_service=oauth_service)

    video_service = VideoService(config=config, token_manager=token_manager)

    return video_service

Obtaining Video Models

Certain video models are available to your application depending on the models that are configured for your instance of Sensory Cloud.
In order to determine which video models are accessible to you, you can execute the below code.

video_service: VideoService = get_video_service()
video_models: GetModelsResponse = video_service.get_models()

Video models contain the following properties:

  • name – the unique name tied to this model. Used when calling any other video function.
  • isEnrollable – indicates if the model can be enrolled into. Models that are enrollable can be used in the CreateEnrollment function.
  • modelType – indicates the class of model and it’s general function.
  • fixedObject – for recognition-based models only. Indicates if this model is built to recognize a specific object.
  • isLivenessSupported – indicates if this model supports liveness for enrollment and authentication. Liveness provides an added layer of security.

Enrolling with Video

In order to enroll with video, you must first ensure you have an enrollable model enabled for your Sensory Cloud instance. This can be obtained via the GetModels() request.
Enrolling with video uses a call and response streaming pattern to allow immediate feedback to the user during enrollment. Enrolling with video uses a video stream iterator that yields image bytes. A sample video stream iterator is shown below and passed into the video enrollmnent class method. It is important to save the enrollmentId in order to perform authentication against it in the future.

class VideoStreamIterator:
    def __init__(self):
        self._camera = cv2.VideoCapture(0)

    def __iter__(self):
        return self

    def __next__(self):
        success, frame = self._camera.read()
        if success:
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            buffer = BytesIO()
            Image.fromarray(frame).save(buffer, format="JPEG", quality=95)
            return buffer.getvalue()
        else:
            raise StopIteration

    def close(self):
        self._camera.release()


model_name: str = "face_recognition_mathilde"
enrollment_description: str = "jhersch-video-enrollment-cpu"

device_id: str = os.environ.get("DEVICE_ID")
user_id: str = os.environ.get("USER_ID")

video_service: VideoService = get_video_service()

video_stream_iterator: VideoStreamIterator = VideoStreamIterator()

enrollment_stream = video_service.stream_enrollment(
    description=enrollment_description,
    user_id=user_id,
    device_id=device_id,
    model_name=model_name,
    is_liveness_enabled=is_liveness_enabled,
    video_stream_iterator=video_stream_iterator,
)

print("Recording enrollment...")
percent_complete = 0
enrollment_id = None
try:
    print(f"percent complete = {percent_complete}")
    for response in enrollment_stream:
        if response.percentComplete != percent_complete:
            percent_complete = response.percentComplete
            print(f"percent complete = {percent_complete}")
        if response.percentComplete >= 100:
            break
    enrollment_id = response.enrollmentId
    print("Enrollment complete!")
    print(f"Enrollment Id = {enrollment_id}")
except Exception as e:
    f"Enrollment failed with error: {str(e)}"
finally:
    video_stream_iterator.close()
    enrollment_stream.cancel()

Authenticating with Video

Authenticating with video is similar to enrollment, except now you have an enrollmentId to pass into the function.

enrollment_id: str = os.environ.get("VIDEO_ENROLLMENT_ID")

video_service: VideoService = get_video_service()

video_stream_iterator: VideoStreamIterator = VideoStreamIterator()

authenticate_stream = video_service.stream_authentication(
    enrollment_id=enrollment_id,
    is_liveness_enabled=False,
    video_stream_iterator=video_stream_iterator,
)

success: bool = False
try:
    print("Authenticating...")
    for response in authenticate_stream:
        print(response.success)
        if response.success:
            success = True
            break
    print("Authentication successful!")
except Exception as e:
    print(f"Authentication failed with error {str(e)}")
finally:
    video_stream_iterator.close()
    authenticate_stream.cancel()

Video Liveness

Video Liveness allows one to send images to Sensory Cloud in order to determine if the subject is a live individual rather than a spoof, such as a paper mask or picture.

model_name: str = "face_recognition_mathilde"
user_id: str = os.environ.get("USER_ID")

video_service: VideoService = get_video_service()

video_stream_iterator: VideoStreamIterator = VideoStreamIterator()

recognition_stream = video_service.stream_liveness_recognition(
    user_id=user_id,
    model_name=model_name,
    video_stream_iterator=video_stream_iterator,
)

alive = False
try:
    print("Running liveness recognition...")
    for response in recognition_stream:
        print(response.isAlive)
        if response.isAlive:
            alive = True
            break
    print("You're alive!")
except Exception as e:
    print(f"Liveness recognition failed with error {str(e)}")
finally:
    video_stream_iterator.close()
    recognition_stream.cancel()

Creating a Management Service

The ManagementService is used to manage typical CRUD operations with Sensory Cloud, such as deleting enrollments or creating enrollment groups.
For more information on the specific functions of the ManagementService, please refer to the management_service.py file.
The example file here shows how to create a ManagementService object.

GitHub

View Github