HiLive

A live streaming chatroom involving multiple modalities, such as voice, gesture, and facial expression.

Introduction

We focus on demonstrating the design, as well as highlighting the advantages and our considerations on design features, using the knowledge and design principles learned in CS3483.

As mentioned in our previous design reports, the project is motivated by the limited chatroom in the live stream. This interface is designed for a better chatroom experience in a live streaming platform based on Twitch, which is a popular video game live streaming platform. It also could be further generalized and adopted by video platforms like YouTube. Viewers usually type some messages in the chatroom and send some emojis to communicate with others and vibrant the atmosphere in the streaming room. However, the monotonous form of interaction has long been criticized by users. Therefore, we propose HiLive, where people can not only send emojis that are generalized according to their recognized facial expression but also say something and send the transcribed message directly without typing it out themselves. With the gesture recognition function, users are no longer bound to the keyboard, therefore, they can have an immersive live streaming interaction experience.

Features

Front Page

The front page levereage the use if Twitch API, and the OAuth authentication system. Users can login and register to the website with Twitch account. And the front page also provides a list of avaliable Live streams for users to click into it.

Recommended Channels Component

The <VideoGroup data={[]}/> component is completely reusable, it only takes a data array from the official Twitch API and shuffles the results (Some sort of recommended channels “algorithm”)

Facial Expression Recognition

The facial expression recognition to emoji is a interesting feature that can let user interact with streamers directly. We have trained a model with more than 2000 images for different emoji categories. The prdiction outcome is in real-time and can be seen in the demo video we provided. The following figure capture some of the facial expression recognition results.

angry image
smile image
surprise image
tease image

Speech to Text Component

Speech to Text used the Google Cloud Speech Recognition API to transcribe the voice to text and sent it to the chatroom. The website will capture the users voice with MDN Web Audio API, which means it will streaming the user’s voice and activate the speech-to-text once receving the real-time speech data in byte format.

Gesture Recognition Component

The gesture recognition using the repository from Real-time Hand Gesture Recognition with 3D CNNs. Here is the demo video of the outcome.

simulation results

Figure: A real-time simulation of the architecture with input video from EgoGesture dataset (on left side) and real-time (online) classification scores of each gesture (on right side) are shown, where each class is annotated with different color.

Built With:

  • Next.JS
  • Typescript
  • Axios
  • Styled-Components
  • TwitchAPI
  • Deployed to Vercel

Run this locally

To run this project locally, you’ll need Node.js installed.

Install dependencies preferably with yarn but you can also use npm install

Create a .env file in the root of the folder based on .env.example.

Run your Next.JS App with yarn dev or npm run dev

Go to localhost:3000 and check out this amazing clone

GitHub

View Github