The evolution of metropolitan structures and the development of urban systems have created various kinds of urban networks, among which two types of networks are of great importance for our daily life, the transportation networks corresponding to human mobility in the physical space, and the communication networks supporting human interactions in the digital space. The rapid expansion in the scope and scale of these two networks raises a series of fundamental research questions on how to optimize these networks for their users. Some of the major objectives include demand responsiveness, anomaly awareness, cost effectiveness, energy efficiency, and service quality. Despite the distinct design intentions and implementation technologies, both the transportation and communication networks share common fundamental structures, and exhibit similar spatio-temporal dynamics. Correspondingly, there exists an array of key challenges that are common in the optimization in both networks, including network profiling, mobility prediction, traffic clustering, and resource allocation. To achieve the optimization objectives and address the research challenges, various analytical models, optimization algorithms, and simulation systems have been proposed and extensively studied across multiple disciplines. Generally, these simulation-based models are not evaluated in real-world networks, which may lead to sub-optimal results in deployment. With the emergence of ubiquitous sensing, communication and computing diagrams, a massive number of urban network data can be collected. Recent advances in big data analytics techniques have provided researchers great potentials to understand these data. Motivated by this trend, we aim to explore a new big data-driven network optimization paradigm, in which we address the above-mentioned research challenges by applying state-of-the-art data analytics methods to achieve network optimization goals. Following this research direction, in this dissertation, we propose two data-driven algorithms for network traffic clustering and user mobility prediction, and apply these algorithms to real-world optimization tasks in the transportation and communication networks. First, by analyzing large-scale traffic datasets from both networks, we propose a graph-based traffic clustering algorithm to better understand the traffic similarities and variations across different area and time. Upon this basis, we apply the traffic clustering algorithm to the following two network optimization applications. 1. Dynamic traffic clustering for demand-responsive bikeshare networks. In this application, we dynamically cluster bike stations with similar usage patterns to obtain stable and predictable cluster-wise bike traffic demands, so as to foresee over-demand stations in the network and enable demand-responsive bike scheduling. Evaluation results using real-world data from New York City and Washington, D.C. show that our framework accurately foresees over-demand clusters (e.g. with 0.882 precision and 0.938 recall in NYC), and outperforms other baseline methods significantly. 2. Complementary traffic clustering for cost-effective C-RAN. In this application, we cluster RRHs with complementary traffic patterns (e.g., an RRH in residential area and an RRH in business district) to reuse the total capacity of the BBUs, so as to reduce the overall deployment cost. We evaluate our framework with real-world network data collected from the city of Milan, Italy and the province of Trentino, Italy. Results show that our method effectively reduces the overall deployment cost to 48.4\% and 51.7\% of the traditional RAN architecture in the two datasets, respectively, and consistently outperforms other baseline methods. Second, by analyzing large-scale user mobility datasets from both networks, we propose [...]