optimal reward
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 4)

H-INDEX

7
(FIVE YEARS 1)

The Light Detection and Ranging (LiDAR) sensor is utilized to track each sensed obstructions at their respective locations with their relative distance, speed, and direction; such sensitive information forwards to the cloud server to predict the vehicle-hit, traffic congestion and road damages. Learn the behaviour of the state to produce an appropriate reward as the recommendation to avoid tragedy. Deep Reinforcement Learning and Q-network predict the complexity and uncertainty of the environment to generate optimal reward to states. Consequently, it activates automatic emergency braking and safe parking assistance to the vehicles. In addition, the proposed work provides safer transport for pedestrians and independent vehicles. Compared to the newer methods, the proposed system experimental results achieved 92.15% higher prediction rate accuracy. Finally, the proposed system saves many humans, animal lives from the vehicle hit, suggests drivers for rerouting to avoid unpredictable traffic, saves fuel consumption, and avoids carbon emission.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Pengfei Wang ◽  
Chi Lin ◽  
Zhen Yu ◽  
Leyou Yang ◽  
Qiang Zhang

The rapidly increasing number of smart devices deployed in the Industrial Internet of Things (IIoT) environment has been witnessed. To improve communication efficiency, edge computing-enabled Industrial Internet of Things (E-IIoT) has gained attention recently. Nevertheless, E-IIoT still cannot conquer the rapidly increasing communication demands when hundreds of millions of IIoT devices are connected at the same time. Considering the future 6G environment where smart network-in-box (NIB) nodes are everywhere (e.g., deployed in vehicles, buses, backpacks, etc.), we propose a crowdsourcing-based recruitment framework, leveraging the power of the crowd to provide extra communication resources and enhance the communication capabilities. We creatively treat NIB nodes as edge layer devices, and CrowdBox is devised using a Stackelberg game where the E-IIoT system is the leader, and the NIB nodes are the followers. CrowdBox can calculate the optimal reward to reach the unique Stackelberg equilibrium where the utility of E-IIoT can be maximized while none of the NIB nodes can improve its utility by deviating from its strategy. Finally, we evaluate the performance of CrowdBox with extensive simulations with various settings, and it shows that CrowdBox outperforms the compared algorithms in improving system utility and attracting more NIB nodes.


Author(s):  
Samuel J. Gershman ◽  
Lucy Lai

AbstractAction selection requires a policy that maps states of the world to a distribution over actions. The amount of memory needed to specify the policy (the policy complexity) increases with the state-dependence of the policy. If there is a capacity limit for policy complexity, then there will also be a trade-off between reward and complexity, since some reward will need to be sacrificed in order to satisfy the capacity constraint. This paper empirically characterizes the trade-off between reward and complexity for both schizophrenia patients and healthy controls. Schizophrenia patients adopt lower complexity policies on average, and these policies are more strongly biased away from the optimal reward-complexity trade-off curve compared to healthy controls. How-ever, healthy controls are also biased away from the optimal trade-off curve, and both groups appear to lie on the same empirical trade-off curve. We explain these findings using a cost-sensitive actor-critic model. Our empirical and theoretical results shed new light on cognitive effort abnormalities in schizophrenia.


2020 ◽  
Vol 45 (4) ◽  
pp. 1466-1497
Author(s):  
Junyu Cao ◽  
Mariana Olvera-Cravioto ◽  
Zuo-Jun (Max) Shen

We propose a model for optimizing the last-mile delivery of n packages from a distribution center to their final recipients, using a strategy that combines the use of ride-sharing platforms (e.g., Uber or Lyft) with traditional in-house van delivery systems. The main objective is to compute the optimal reward offered to private drivers for each of the n packages such that the total expected cost of delivering all packages is minimized. Our technical approach is based on the formulation of a discrete sequential packing problem, in which bundles of packages are picked up from the warehouse at random times during the interval [Formula: see text]. Our theoretical results include both exact and asymptotic (as [Formula: see text]) expressions for the expected number of packages that are picked up by time T. They are closely related to the classical Rényi’s parking/packing problem. Our proposed framework is scalable with the number of packages.


2018 ◽  
Vol 10 (12) ◽  
pp. 4744 ◽  
Author(s):  
Yangke Ding ◽  
Lei Ma ◽  
Ye Zhang ◽  
Dingzhong Feng

The aim of this paper is to discuss the coopetition (cooperative competition) relationship between a manufacturer and a collector in the collection of waste mobile phones (WMPs) and examine the evolution mechanism and the internal reward-penalty mechanism (RPM) for their collection strategies. A coopetition evolutionary game model based on evolutionary game theory was developed to obtain their common and evolutional collection strategies. The pure-strategy Nash equilibriums of this model were obtained which showed their collection strategy choices of perfect competition or cooperation. The mixed strategy Nash equilibrium was obtained which revealed evolution trends and laws. In addition, the optimal RPM was obtained in the sensitivity analysis of related parameters. The example of WMPs in China was taken to examine the simulation of the RPM. Results show that (i) although the manufacturer and the collector may change their strategies of cooperation and competition over time, cooperation is their best choice to increase payoffs; (ii) the optimal RPM is beneficial to propel their cooperation tendency and then to increase their payoffs.


2018 ◽  
Vol 2018 ◽  
pp. 1-15
Author(s):  
Chao Li ◽  
Zhijian Qiu

We consider the dynamic contract model with time inconsistency preference of principal-agent problem to study the influence of the time inconsistency preference on the optimal effort and the optimal reward mechanism. We show that when both the principal and the agent are time-consistent, the optimal effort and the optimal reward are the decreasing functions of the uncertain factor. And when the agent is time-inconsistent, the impatience of the agent has a negative impact on the optimal contract. The higher the discount rate of the agent is, the lower the efforts provided; agents tend to the timely enjoyment. In addition, when both the principal and the agent are time-inconsistent, in a special case, their impatience can offset the impact of uncertainty factor on the optimal contract, but, in turn, their impatience will affect the contract.


2018 ◽  
Vol 33 (2) ◽  
pp. 22-36 ◽  
Author(s):  
Espen A. Sjoberg ◽  
◽  
Espen B. Johansen

2018 ◽  
Vol 28 (4) ◽  
pp. 501-520
Author(s):  
Dmitry Rokhlin ◽  
Anatoly Usov

We consider a manager who allocates some fixed total payment amount between N rational agents in order to maximize the aggregate production. The profit of i-th agent is the difference between the compensation (reward) obtained from the manager and the production cost. We compare (i) the normative compensation scheme where the manager enforces the agents to follow an optimal cooperative strategy; (ii) the linear piece rates compensation scheme where the manager announces an optimal reward per unit good; (iii) the proportional compensation scheme where agent's reward is proportional to his contribution to the total output. Denoting the correspondent total production levels by s*, ? and s? respectively, where the last one is related to the unique Nash equilibrium, we examine the limits of the prices of anarchy AN = s*/s?, A'N = ?/s? as N ? ?. These limits are calculated for the cases of identical convex costs with power asymptotics at the origin, and for power costs, corresponding to the Coob-Douglas and generalized CES production functions with decreasing returns to scale. Our results show that asymptotically no performance is lost in terms of A'N , and in terms of AN the loss does not exceed 31%.


Sign in / Sign up

Export Citation Format

Share Document