Object Recognition / Neural Networks and Deep Learning

ClawGUI: the first open-source end-to-end framework for GUI agents — from training to real device

15 April 2026

ClawGUI: the first open-source end-to-end framework for GUI agents — from training to real device

Researchers from Zhejiang University have published ClawGUI — a fully open-source framework for building GUI agents that control applications through their visual interface, just like a human would: taps, swipes,…

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

12 November 2025

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

Researchers from Xiaohongshu introduced DeepEyesV2 — an agentic multimodal model based on Qwen2.5-VL-7B that can not only understand text and images but also actively use external tools: execute Python code…

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

19 May 2025

19 May 2025

NVIDIA Isaac robotics platform showing a humanoid robot interacting with objects

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

19 May 2025

NVIDIA continues to push the boundaries of AI-driven robotics with significant updates to its Isaac ecosystem, announced at COMPUTEX 2025. These innovations address key challenges in robotics development by enhancing…

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

10 June 2024

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

Hugging Face and Pollen Robotics unveiled the anthropomorphic robot Reachy2, whose training dataset and model are open-source. Reachy2 performs household tasks and interacts safely with people and pets. Pollen Robotics…

ChatGPT Enhancements: Voice Conversations and Image Recognition

25 September 2023

25 September 2023

ChatGPT conversations and image recognition

ChatGPT Enhancements: Voice Conversations and Image Recognition

25 September 2023

ChatGPT will be able to engage in voice conversations and recognize objects in images. For instance, ChatGPT is ready to read bedtime stories, assist in creating recipes from photos of…

PhotoGuard: Protecting Images from Generative Model Alterations

5 September 2023

5 September 2023

PhotoGuard: Protecting Images from Generative Model Alterations

5 September 2023

Researchers at MIT have introduced PhotoGuard, an algorithm designed to safeguard images from unauthorized alterations by generative models, ensuring the authenticity of images. The widespread use of generative models such…

Pixellib: library for object segmentation in photos and videos

28 January 2021

28 January 2021

Pixellib: library for object segmentation in photos and videos

28 January 2021

Pixellib is a library for the task of segmenting objects in images and videos. The library supports two main types of object segmentation: semantic and instance segmentation. The complexity of…

Object Recognition

ClawGUI: the first open-source end-to-end framework for GUI agents — from training to real device

ClawGUI: the first open-source end-to-end framework for GUI agents — from training to real device

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

ChatGPT Enhancements: Voice Conversations and Image Recognition

ChatGPT Enhancements: Voice Conversations and Image Recognition

PhotoGuard: Protecting Images from Generative Model Alterations

PhotoGuard: Protecting Images from Generative Model Alterations

Pixellib: library for object segmentation in photos and videos

Pixellib: library for object segmentation in photos and videos

New Datasets for 3D Object Recognition

New Datasets for 3D Object Recognition

“Falling Things”: A Synthetic Dataset by Nvidia for Pose Estimation

“Falling Things”: A Synthetic Dataset by Nvidia for Pose Estimation

How Has MS Voxel Deep Network Managed to Improve 3D Objects Recognition Using Cloud Map Only

How Has MS Voxel Deep Network Managed to Improve 3D Objects Recognition Using Cloud Map Only