With the continuous increase in the data requirements of scientific and commercial applications, access to remote and distributed data has become a major bottleneck for end-to-end application performance. Traditional distributed computing systems closely couple data access and computation, and generally, data access is considered a side effect of computation. The limitations of traditional distributed computing systems and CPU-oriented scheduling and workflow management tools in managing complex data handling have motivated a newly emerging era: data-aware distributed computing. In this chapter, the authors elaborate on how the most crucial distributed computing components, such as scheduling, workflow management, and end-to-end throughput optimization, can become “data-aware.” In this new computing paradigm, called data-aware distributed computing, data placement activities are represented as full-featured jobs in the end-to-end workflow, and they are queued, managed, scheduled, and optimized via a specialized data-aware scheduler. As part of this new paradigm, the authors present a set of tools for mitigating the data bottleneck in distributed computing systems, which consists of three main components: a data-aware scheduler, which provides capabilities such as planning, scheduling, resource reservation, job execution, and error recovery for data movement tasks; integration of these capabilities to the other layers in distributed computing, such as workflow planning; and further optimization of data movement tasks via dynamic tuning of underlying protocol transfer parameters.