video input
Recently Published Documents


TOTAL DOCUMENTS

68
(FIVE YEARS 23)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Author(s):  
◽  
Thomas Iorns

<p>The application of the newly popular content medium of 360 degree panoramic video to the widely used offline lighting technique of image based lighting is explored, and a system solution for real-time image based lighting of virtual objects using only the provided 360 degree video for lighting is developed. The system solution is suitable for use on live streaming video input, and is shown to run on consumer grade graphics hardware at the high resolutions and framerates necessary for comfortable viewing on head mounted displays, rendering at over 60 frames per second for stereo output at 1182x1464 per eye on a mid-range graphics card. Its use in several real-world applications is also studied, and extension to consider real-time shadowing and reflection is explored.</p>


2021 ◽  
Author(s):  
◽  
Thomas Iorns

<p>The application of the newly popular content medium of 360 degree panoramic video to the widely used offline lighting technique of image based lighting is explored, and a system solution for real-time image based lighting of virtual objects using only the provided 360 degree video for lighting is developed. The system solution is suitable for use on live streaming video input, and is shown to run on consumer grade graphics hardware at the high resolutions and framerates necessary for comfortable viewing on head mounted displays, rendering at over 60 frames per second for stereo output at 1182x1464 per eye on a mid-range graphics card. Its use in several real-world applications is also studied, and extension to consider real-time shadowing and reflection is explored.</p>


Author(s):  
Jinglu Zhang ◽  
Yinyu Nie ◽  
Yao Lyu ◽  
Xiaosong Yang ◽  
Jian Chang ◽  
...  

Abstract Purpose Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences. Methods We utilize symmetric 1D temporal dilated convolution layers to hierarchically capture gesture clues under different receptive fields such that features in different time span can be aggregated. In addition, a self-attention network is bridged in the middle to calculate the global frame-to-frame relativity. Results We evaluate our method on a robotic suturing task from the JIGSAWS dataset. The gesture recognition task largely outperforms the state of the arts on the frame-wise accuracy up to $$\sim $$ ∼ 6 points and the F1@50 score $$\sim $$ ∼ 8 points. We also keep the 100% predicted accuracy for the skill assessment task using LOSO validation scheme. Conclusion The results indicate that our architecture is able to obtain representative surgical video features by extensively considering the spatial, temporal and relational context from raw video input. Furthermore, the better performance in multi-task learning implies that surgical skill assessment has a complementary effects to gesture recognition task.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6774
Author(s):  
Doyoung Kim ◽  
Inwoong Lee ◽  
Dohyung Kim ◽  
Sanghoon Lee

The development of action recognition models has shown great performance on various video datasets. Nevertheless, because there is no rich data on target actions in existing datasets, it is insufficient to perform action recognition applications required by industries. To satisfy this requirement, datasets composed of target actions with high availability have been created, but it is difficult to capture various characteristics in actual environments because video data are generated in a specific environment. In this paper, we introduce a new ETRI-Activity3D-LivingLab dataset, which provides action sequences in actual environments and helps to handle a network generalization issue due to the dataset shift. When the action recognition model is trained on the ETRI-Activity3D and KIST SynADL datasets and evaluated on the ETRI-Activity3D-LivingLab dataset, the performance can be severely degraded because the datasets were captured in different environments domains. To reduce this dataset shift between training and testing datasets, we propose a close-up of maximum activation, which magnifies the most activated part of a video input in detail. In addition, we present various experimental results and analysis that show the dataset shift and demonstrate the effectiveness of the proposed method.


Author(s):  
Madhura Prakash ◽  
Aishwarya S ◽  
Disha Maru ◽  
Naman Chandra ◽  
Varshini V ◽  
...  

There has been over the past few years, a very increased popularity for yoga. A lot of literatures have been published that claim yoga to be beneficial in improving the overall lifestyle and health especially in rehabilitation, mental health and more. Considering the fast-paced lives that individuals live, people usually prefer to exercise or work-out from the comfort of their homes and with that a need for an instructor arises. Hence why, we have developed a self-assisted system which can be used to detect and classify yoga asanas, which is discussed in-depth in this paper. Especially now when the pandemic has taken over the world, it is not feasible to attend physical classes or have an instructor over. Using the technology of Computer Vision, a computer-assisted system such as the one discussed, comes in very handy. The technologies such as ml5.js, PoseNet and Neural Networks are made use for the human pose estimation and classification. The proposed system uses the above-mentioned technologies to take in a real-time video input and analyze the pose of an individual, and classifies the poses into yoga asanas. It also displays the name of the yoga asana that is detected along with the confidence score.


Author(s):  
Priyanka Agrawal

The face is seen as a key component of the human body, and humans utilise it to identify one another. Face detection in video refers to the process of detecting a person's face from a video sequence, while face tracking refers to the process of tracking the person's face throughout the video. Face detection and tracking has become a widely researched issue due to applications such as video surveillance systems and identifying criminal activity. However, working with videos is tough due to problems such as bad illumination, low resolution, and atypical posture, among others. It is critical to produce a fair analysis of various tracking and detection strategies in order to fulfil the goal of video tracking and detection. Closed-circuit television (CCTV) technology had a significant impact on how crimes were investigated and solved. The material used to review crime scenes was CCTV footage. CCTV systems, on the other hand, just offer footage and do not have the ability to analyse it. In this research, we propose a system that can be integrated with the CCTV footage or any other video input like webcam to detect, recognise, and track a person of interest. Our system will follow people as they move through a space and will be able to detect and recognise human faces. It enables video analytics, allowing existing cameras to be combined with a system that will recognise individuals and track their activities over time. It may be used for remote surveillance and can be integrated into video analytics software and CCTV security solutions as a component. It may be used on college campuses, in offices, and in shopping malls, among other places.


Author(s):  
Rashmi Jain ◽  
Prachi Tamgade ◽  
R. Swaroopa ◽  
Pranoti Bhure ◽  
Srushti Shahu ◽  
...  

Perceiving the surroundings accurately and quickly is one of the most essential and challenging tasks for systems such as self-driving cars. view to the car making it more informed about the environment than a human driver. To build a fully virtual self-driving car, we have to build two things, Self-driving car software and virtual Self-driving car. Self-driving software can do two things one is based on video input of the road, the software can determine how to safely and effectively steer the car another is based on video input of the road, the software can determine how to safely and effectively use the car’s acceleration and braking mechanisms.


2021 ◽  
Author(s):  
Negar Taherian

The field of high dynamic range (HDR) imaging deals with capturing the luminance of a natural scene, usually varying between 10−3 to 105 cd/m2 and displaying it on digital devices with much lower dynamic range. Here, we present a novel tone mapping algorithm that is based on K-means clustering. Our algorithm takes into account the color information within a frame and using k-means clustering algorithm it builds clusters on the intensities within an image and shifts the values within each cluster to a displayable dynamic range. We also implement a scene change detection to reduce the running time of our algorithm by using the cluster information from the previous frame for frames within the same scene. To reduce the flicker effect, we proposed a new method that multiplies a leaky integer to the centroid values of our clustering results. Our algorithm runs in O( N logK + K logK ) for an image with N input luminance levels and K output levels. We also show how to extend the method to handle video input. We display that our algorithm gives comparable results to state-of-the- art tone mapping algorithms. We test our algorithm on a number of standard high dynamic range images and video sequences and provide qualitative and quantitative comparisons to a number of state-of-the-art tone mapping algorithms for videos.


2021 ◽  
Author(s):  
Negar Taherian

The field of high dynamic range (HDR) imaging deals with capturing the luminance of a natural scene, usually varying between 10−3 to 105 cd/m2 and displaying it on digital devices with much lower dynamic range. Here, we present a novel tone mapping algorithm that is based on K-means clustering. Our algorithm takes into account the color information within a frame and using k-means clustering algorithm it builds clusters on the intensities within an image and shifts the values within each cluster to a displayable dynamic range. We also implement a scene change detection to reduce the running time of our algorithm by using the cluster information from the previous frame for frames within the same scene. To reduce the flicker effect, we proposed a new method that multiplies a leaky integer to the centroid values of our clustering results. Our algorithm runs in O( N logK + K logK ) for an image with N input luminance levels and K output levels. We also show how to extend the method to handle video input. We display that our algorithm gives comparable results to state-of-the- art tone mapping algorithms. We test our algorithm on a number of standard high dynamic range images and video sequences and provide qualitative and quantitative comparisons to a number of state-of-the-art tone mapping algorithms for videos.


Sign in / Sign up

Export Citation Format

Share Document