How AI is used in the real world ① -Speech to text

Recently, I often hear the word multimodal AI.I've heard the word modal somewhere.Modal (modal) is an adjective meaning "mode".If you're an engineer, you'll think of a modal window that is forcibly displayed on a parent window.In fact, this means "waiting mode window", which means that if you do not close it, you will not be able to operate the parent window.

AIは実社会でどのように活用されているのか①ー音声認識(Speech to Text)

Multi -modal AI is AI that integrates multiple (multi) data.Humans originally obtain information from multi, and judge.For example, table tennis not only "see the ball" that the opponent hit, but also "listen to the sound" on the racket, predict which course will come and shake the racket.I saw it in a TV experiment before, but I was surprised that even top players would empty when the sound was shifted.

It is a multi -modal AI to judge multiple input information like humans.Until now, it was an individual processing technology such as CNN (folding neural network) for visual (image), and rnn (recursion NN) for hearing (audio recognition).We are trying to make this a multi -modal and overlap multiple information to evolve into AI that makes a more advanced judgment.

"ECHONET IoT Master System" Course, Internet Academy Begins Offering

It was the best when using Amazon Echo Studio as a dual (1/2 page)

"Echo Auto" that can use Alexa in the car, actually installed in the car [Basics]: Masahiro Yamaguchi's smart speaker life (1/2 page)

It is nice to have a remote control that does not require charging! Smart plug "REMOF"

"Animal Crossing: New Year's Forest" is finally here!

DoCoMo's "ahamo" is eligible for family discount, President DoCoMo Ii reveals

How far can you use the mobile version of Word!? Check the functions for Android/iOS

"I want people who don't understand to call me more easily" Wacom Support Center's passion for customers who have achieved 97% customer satisfaction

How AI is used in the real world ① -Speech to text