Dahua’s Xinghan AI: Revolutionizing Video Security

Dahua’s Xinghan AI Models: Revolutionizing Video Surveillance with Intelligent IoT

Dahua, a leading innovator in intelligent IoT solutions centered around video technology, has consistently pushed the boundaries of AI research and development. Their latest breakthrough, an upgraded version of the Xinghan Large-Scale AI Models, promises to make video surveillance smarter than ever before by combining multimodal capabilities with in-depth industry knowledge. Let’s delve into how Xinghan is poised to redefine security.

Addressing Conventional AI Challenges

Xinghan directly tackles the limitations of traditional CNN AI, including the difficulty of detecting small targets at long distances, false alarms triggered by common environmental factors like birds and leaves, and the lengthy customization cycles required to create new algorithms.

“During the industry’s digital and intelligent transformation, AI technology still faces challenges,” explains Frank Fang, Overseas Product Director at Dahua. “While algorithm accuracy has reached high levels in some areas, demands for adaptive intelligence across complex, dynamic scenarios and higher accuracy continue to rise. Simultaneously, business needs are evolving from perception and simple cognition to complex cognition. Additionally, complex rule configuration and cumbersome interactions in practical applications hinder usability. With advancements in large model technology, Dahua launched the Xinghan Large-Scale AI Models to address these issues.”

Fang emphasizes that Xinghan is designed to resolve real-world user challenges through five key differentiators:

* **From accuracy to precision:** Enhanced detection in challenging conditions like identifying tiny targets, processing blurry images, and managing strong backlighting, ensuring stable and reliable recognition.
* **From customization to generalization:** Significantly reduced development time for custom algorithms, streamlining complex processes.
* **From recognition to comprehension:** Support for understanding complex multi-target interactions, moving beyond simple behavior recognition.
* **From static to dynamic:** Overcoming the limitations of fixed rule configurations by enabling autonomous scene analysis and dynamic adaptation.
* **Enhanced language and multimodal capabilities:** Simplified operations through natural language interaction, processing text, images, and video for a comprehensive understanding of the environment.

Different Models

Since its debut in 2023, Xinghan has continued to evolve by merging multimodal intelligence with deep domain expertise. This evolution has resulted in three core series within the Xinghan framework: Xinghan Vision Models (focused on vision-centric intelligence), Xinghan Multimodal Models (offering multimodal-fusion capabilities), and Xinghan Language Models (enabling language-driven interaction). Here, we will focus on the Vision and Multimodal Models.

Xinghan Vision Models

The Xinghan Vision Models are integrated into specific camera models within Dahua’s IPC and PTZ series. Given that large models typically reside on servers, deploying Xinghan on edge devices requires minimizing model size and employing advanced training techniques, which can be likened to a comprehensive education.

“First, we enable the algorithm to undergo unsupervised training using hundreds of millions unlabeled data, resulting in a massive pre-trained model that is extensive and diverse, broad but not precise – somewhat like our primary and secondary school curricula that cover all foundational subjects without delving deeply into any specific field,” explains Xiangming Zhou, R&D Expert at Dahua.

He further elaborates: “To address our specific business needs, we then employ supervised training with labeled task-specific data to develop our expert task model. This labeled training phase can be likened to university education – students focus on their majors, continuously refining professional knowledge while gradually forgetting many secondary school subjects irrelevant to their specialization. To meet camera deployment requirements, we further perform knowledge distillation, fine-tuning, and quantization on the expert task model, significantly reducing its parameter count. This ultimately yields an edge-side large model precisely tailored for specific business objectives and products.”

The Xinghan Vision Models enhance the accuracy and intelligence of video analysis, enabling diverse applications. In Perimeter Protection, the detection distance is increased by 50 percent while maintaining 98 percent accuracy and reducing false alarms by 92 percent. Based on the Xinghan Large-Scale AI Models capability, Perimeter Protection innovatively releases the AI Rule Assist function, which can automatically analyze the scene and automatically generate regional intrusion rule lines. It is easy to operate and improves efficiency. Perimeter Protection also supports more than 10 animal detections, bringing more value to users.

Other applications include WizTracking, which maintains effective tracking of individuals even when their posture changes or they are partially obscured, and Crowd Map, which supports the detection of small targets at long distances and up to 5,000 persons in large-scale scenarios. Furthermore, AI WDR leverages the capabilities of the Xinghan Large-Scale AI Models to automatically identify the scene and determine whether to activate or deactivate WDR based on changes in the picture. This eliminates the need for manual adjustments, ensuring a clear image while reducing the user’s operational burden.

Xinghan Multimodal Models

Unlike unimodal models, which are restricted to processing a single data type (e.g., text or images), the Xinghan Multimodal Models are AI systems capable of simultaneously processing and deeply integrating multiple heterogeneous data types (such as text, images, and video). This empowers a wide range of applications, including WizSeek and text-defined alarms.

Leveraging the power of Dahua Xinghan Multimodal model technology, WizSeek transforms video retrieval. It aims to solve video retrieval pain points such as lack of support for multi-condition retrieval and over-reliance on target-events presetting. Suppose the user wants to look for a man making a phone call near a car. With conventional metadata search, the user can only select attributes one by one, and behaviors such as “calling” can’t be retrieved. With WizSeek, the user just needs to text “A man making phone call near a car” and locate the footage in a matter of seconds. WizSeek revolutionizes the video search experience, delivering unparalleled speed, precision, and efficiency when navigating vast amounts of video clips, while offering an exceptionally intuitive and streamlined user journey

Text-defined alarms, on the other hand, enable custom arming through text descriptions. New algorithms can be developed using prompt text, significantly lowering the barrier to entry. For example, creating an algorithm for “human pushing a stroller” using conventional AI requires material collection, data annotation, development on the device, and algorithm training – a process that takes about a month. With text-defined alarms, powered by the multimodal models capabilities, the user only needs to type the text “human pushing a stroller,” and a model will be created and deployed in seconds. After creating a new algorithm for “Text-defined Alarms” in recorders (IVSS), the user can directly perform local training within the same device to optimize algorithm performance, saving significant time and labor costs, and the optimized algorithm can truly help “Text-defined Alarms‌” realize “More Use, More Accuracy.” The Xinghan Multimodal Models are featured in Dahua Products including NVR, IVSS and IVD.

A Leader in AI Technologies

In conclusion, video surveillance has transitioned from simply capturing a scene to truly understanding it. Dahua has clearly embraced this evolution with Xinghan, which understands complex multi-target interactions, reduces false alarms, and shortens deployment cycles, ultimately providing users with enhanced security and business intelligence.

With Xinghan, Dahua demonstrates the potential of next-generation AI, reaffirming its position as a leader in advanced AI technologies.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x