Storage systems rely on write dependency to achieve atomicity and consistency. However, enforcing write dependency comes at the expense of performance; it concatenates multiple hardware queues into a single logical queue, disables the concurrency of flash storage and serializes the access to isolated devices. Such serialization prevents the storage system from taking full advantage of high-performance drives (e.g., NVMe SSD) and storage arrays. In this paper, we propose a new IO stack called Horae to alleviate the write dependency overhead for high-performance drives. Horae separates the dependency control from the data flow, and uses a dedicated interface to maintain the write dependency. Further, Horae introduces the joint flush to enable parallel FLUSH commands on individual devices, and write redirection to handle dependency loops and parallelize in-place updates. We implement Horae in Linux kernel and demonstrate its effectiveness through a wide variety of workloads. Evaluations show Horae brings up to 1.8× and 2.1× performance gain in MySQL and BlueStore, respectively.