What's the use of letting a machine learn to recognize a face?
In just one or two seconds, the camera can determine if you are the person recorded in the registration information. Even if you accidentally shake it in front of the camera, the smart lens will still recognize you. As for the kind of face blindness that often occurs in humans, especially when facing other races, rest assured. As long as you put pictures of various faces into the database of the machine in advance and train it, it will rarely recognize the wrong person.
What makes the machine capable of this are tens of thousands of labeled face photos with data and algorithms, as well as the cognitive laws that humans have given it or based on its recognition of tens of thousands of photos.
Information data explosion
In terms of data, human-generated data has increased exponentially. According to IBM's estimates, the amount of information created by humans from history to 2003 is 5 EB (Editor's Note: EB refers to Exabyte, a computer storage unit. If you introduce the smaller information storage unit that everyone is more familiar with, 1TB = 1024GB, 1PB = 1024TB and 1EB = 1024PB). By 2011, human beings can generate 5EB of information every two days. By 2018, humans can generate such information in about an hour.
Facing the explosion of information and data, technology companies first solve the problem of storing historical data, and then improve their ability to handle current high-concurrency data operations. In the third step, they began to think about how to restore the accumulated data. If you consider data collection, transmission, storage, calculation and application as an industrial chain, calculation and application are the most valuable links among them.
Tencent's artificial intelligence team looks at the data in the doctor's mind. They quantified the experience of some old doctors in judging diseases based on CT pictures into indicators, and trained artificial intelligence products that can help inexperienced doctors to judge medical images, improving the accuracy of diagnosis.
Mobile map applications such as Gaode Map have already been able to analyze the driving speed of most users on a certain road segment based on the user's geographic location information based on the real-time use of Gaode Map data based on the user's progress on the road. Find out if there is congestion on that road.
For example, the speed of a road should be 40 kilometers per hour. Suddenly, the user's speed in a certain section has become 0, which means that the traffic is blocked. Alibaba partner and president of Gaode Map Liu Zhenfei told China Business News magazine . He used the iceberg graphic as a metaphor, saying that there are actually two high-tech maps: one is a high-tech map on a mobile phone and a car, This is a high-tech map everyone sees; the other is an invisible high-tech map. There are more than 300,000 apps in the Chinese market, including today's headlines, Meituan, Weibo, and online car rides ... the positioning functions used by them are provided by Gaode's open platform. These two parts together form a huge amount of big data on human-land relationship. Coupled with the traffic alarm data of traffic police departments and road conditions and incident data that users actively share, Gaode can even distinguish which congestion is caused by vehicle aggregation and which comes from traffic accidents.
Combining data collected from various aspects
The latest attempt of Liu Zhenfei's team is to cooperate with the Traffic Management Science Research Institute of the Ministry of Public Security to open up a smart traffic light system for the entire city. Combining real-time road conditions data collected from various parties, a set of algorithms is used to help traffic management departments achieve more dynamic intelligence in the cloud. Dispatch all traffic lights to configure a more reasonable duration. Drivers using Gaode map navigation can also get real-time data of traffic lights along the way, which can help them control their speed more reasonably and avoid red lights as much as possible.
The above intelligent traffic light test has been successfully tested in Wuxi. However, this two-dimensional live map is not the ultimate goal of the map company. In the future, autonomous driving requires high-precision maps. It must be able to tell whether one lane is four lanes, how high the railings are at a certain place, where the traffic lights are, and whether there are two trees by the road.
When the map company tried to change the map from two-dimensional to three-dimensional, car manufacturers also wanted to fill their cars with sensors and cameras.
Shangtang Technology is one of the suppliers that provides visual solutions for automotive companies. It is testing the video images transmitted by the camera to identify surrounding models and objects, and calculate the vehicle distance based on the length of the object and the distance between the lenses, providing a basis for decision-making for autonomous driving. To accomplish this, it has labeled tens of thousands of vehicles and traffic signs just like faces.
More than 90% of people's information comes from the eyes, and images and videos are the biggest way for people to interact with the world. Yang Fan, co-founder and vice president of Shangtang Technology, told China Business News that valuable data is Information, when the carrier of information evolves from binary code of 0, 1 to numbers, text, voice, images, and video, the form of data and information carrying will develop in an increasingly anthropomorphic direction, that is, the communication of future social life, The requirements on people will become lower and lower, but the requirements on machines will become higher and higher. In 2014, Shangtang's DeepID algorithm allowed the machine's face recognition accuracy to exceed the human eye recognition rate for the first time.
When technology companies such as Tencent, Gaode, and Shangtang feed data to the machine, let the machine begin to understand the human mind to give it a law, or let it summarize the law itself and build a cognitive model (the correlation between the two models) Technology is called artificial intelligence, in fact, the latter is more accurately called machine intelligence), and the application scenarios of data have been further expanded.
In the early days, these user footprints captured by Internet companies were only used for so-called precision marketing. For example, if you bought beer, it would probably recommend diapers to you, or you read a news about young people in towns. Next, reading the software is likely to push you a piece of worker's workshop life based on relevance in human cognitive models. Now, new data application scenarios are not limited to these ready-made 01 data forms. From physical retail to transportation, driverless, intelligent manufacturing, smart medical, and online games and live broadcast, each traditional market has not been digitized. Also starting to get excited about the data.
In these emerging scenarios, the form of data is dominated by voice and images, and data production (that is, acquisition) and applications are performed simultaneously. Take unmanned driving as an example. Once the physical world data collected through cameras or sensors enters the computing system of unmanned vehicles, decisions about turning or avoiding must be made immediately. For these scenarios, historical data has only training machine modeling capabilities.
There are already many emerging scenarios, but at this stage, the speed of commercialization of technology companies in these market segments is not considerable. In June 2018, Cadillac released an intelligent driving system in cooperation with Gaode Map, which provides the high-precision map needed for navigation. This system achieves L3 level drivers can release their hands during driving, but the driving range is limited to a section of 30 kilometers of highway. If you want to drive this car into a big city like Beijing or Shanghai in the L3 state, you have to wait a few years, because the road conditions become more complicated, the cost and difficulty of making high-precision maps are rising rapidly, and you may need to launch. satellite. At present, the mapping of high-precision maps is still in the early experimental stage.
Google's Soli sensor, a radar-based motion sensing device approved on January 2, 2019, also faces the same problem. The Soli sensor can use radar beams to capture movements in three-dimensional space, allowing users to press the thumb and index fingertips. Virtual buttons, or virtual dialing through friction between thumb and forefinger.
There is a huge gap between ideas and reality
Although the computing power has been greatly improved and the machines and equipment are becoming more and more intelligent, we should face up to the huge gap between concept and reality: You can make a sample and a partial demonstration, but if you It takes a huge cost to make something that you can use, that I can use, and that can be used by parents at home. That is an engineering technology. We are exploring step by step.
In terms of autonomous driving, all calculations must be completed within milliseconds in order to be meaningful for autonomous driving decisions. This depends not only on the commercialization of 5G, but also the need for terminal equipment to complete the hardware revolution from function phones to smart phones. Not only must cars be smart, but roads must also be smart. The reality is that most of the domestic traffic lights and cameras used by the transportation sector are not connected to the Internet. Some traffic lights even need to be manually controlled manually. Every 30 seconds or 1 minute, the person who manages the signal lights will toggle the switch.
The privacy and medical security issues of medical data have caused medical image recognition products such as Tencent Miying to fail to obtain a commercially available medical device license. Because domestic commercial insurance is underdeveloped, domestic image recognition companies have yet to find a target willing to pay for their technology.
According to the difficulty of commercialization, Yang Fan divided the data application scenarios combined with artificial intelligence technology into head scenarios and long tail scenarios. In his opinion, autonomous driving, medical image recognition, and smart city projects are considered head scenarios. They are also one of the main reasons for major companies to invest in homogeneity in the field of artificial intelligence. And more scenes actually exist in the long tail part.
Deep learning training system
In the early years, we were optimistic about using vision recognition to make Industry 4.0, but later found that this field is not as good as we initially thought. The key reason is that the scenes in it are specially subdivided. The problems faced by each production line can be broadly speaking. It's called 'video analysis', but each algorithm has different problems to solve. Yang Fan explained that if a set of algorithms is to be developed for each segmented scenario, the premise must be the commercial return of the scenario through algorithm optimization. To support the cost of technology. Therefore, only by continuously upgrading the deep learning training system, advancing the standardization of algorithm production, breaking through technical barriers in different scenarios, and reducing product development costs, can technology have the opportunity to enter more scattered, small, personalized long tail scenarios .
The law and order department of a certain city has proposed to set up a camera by the river, and it will alarm automatically if someone jumps into the river. This matter is technically feasible. The challenge is that at least tens of thousands of people are jumping into the lake. According to Yang Fan's experience, the data of tens of thousands is the base of the training machine to train a lake-jumping algorithm, and this algorithm may Just adapting to jumping in the lake somewhere, but not another place. This is an extreme scenario requirement, but it also reflects that the process of data production will encounter the problem of insufficient native data.
Simulation training is a way to solve the lack of native data. Shangtang utilizes technologies such as real muck truck labeling data, and simulation training of embedding muck truck images into videos to achieve the function of immediately triggering real-time alarms when city cameras capture muck trucks entering the city illegally.
Li Feifei, the former chief scientist of Google Cloud AI and a professor at Stanford University, mentioned that she led the team to develop an elderly monitoring product. When the elderly fell, the machine could alarm. But to realize this function, the premise is to have tens of thousands of elderly behavioral image data. In the end, they solved the problem by having a simulated old man fall. In Silicon Valley, a number of institutions specialized in the production of simulation data were born.
Simulation data can solve the problem of insufficient data volume, but still face the challenge of data diversity. If the original data of many scenes cannot guarantee even the most basic diversity coverage, there is no way to simulate it. Simulating what kind of person jumps into the river is real, and what is fake? If you simulate something that is not real enough, you don't know what the machine will learn. Yang Fan explained the complexity.
A balance between business efficiency and data security
Data is not only an asset, it is also a resource. From the perspective of business value, data can only obtain the maximum realizing value if it is constructed with the user as the center. Gold's parent company Alibaba currently has three major data pools, Taobao, Alipay and Gold, which respectively mean people and goods, people and assets, people and locations. If these three types of data are connected, Alibaba's business will not be just for Gaode to help Hema Xiansheng decide where to choose the best store to open, it will achieve real online and offline connections. But Alibaba has so far failed to get through the underlying data between the three. Because as a technology company, you have to consider both business efficiency and data security. Data security, to a large extent, refers to the protection of users' personal privacy.
While excited about developing a high-precision map that restores the real physical world, Liu Zhenfei is also thinking: can data solve all problems?
In 2016, his team worked with the Ministry of Public Security on a charity project to find missing children. During the entire process, cameras and face recognition were not used. Instead, relying on Gaode's positioning capabilities and nails, they developed a collaborative office plug-in for the more than 6,000 police officers in charge of looking for children (finding children). When a child is lost, the local police publishes tracing information through nailing, and uses Gaode's geo-fence and location-based push (LBP) technology and interface to directly push this message to Gaode users within a specified range. When a user voluntarily provides a clue, the full-time civilian police will be responsible for verifying that this method of finding is no different from the traditional process, but the work efficiency is greatly improved. The project called Reunion released 3,419 children's missing information within two years, and the recovery rate reached 98.4%.
Later, we wanted to do another project to find the lost elderly people. It was not a question of technology and money, but a lack of clear government departments to manage this matter. At present, there is no team dedicated to finding elderly people like looking for children. This incident made Liu Zhenfei realize that not all problems can be solved with technology. In the beginning, we also thought about setting up cameras at train stations and bus stations. Later, we discovered that technology alone is not enough. So, there are At the same time, we cannot exaggerate the capabilities of technology too ideally.