Concept of ZFS transaction semantics

HI,

Can anyone explain me the concept behind ZFS transactional semantics \(either a transaction is entirely commited or it is not\)? so data and disk failures are reduced.

Transactional Semantics

ZFS is a transactional file system, which means that the file system state is always consistent on disk. Traditional file systems overwrite data in place, which means that if the machine loses power, for example, between the time a data block is allocated and when it is linked into a directory, the file system will be left in an inconsistent state. Historically, this problem was solved through the use of the fsck command. This command was responsible for going through and verifying file system state, making an attempt to repair any inconsistencies in the process. This problem caused great pain to administrators and was never guaranteed to fix all possible problems. More recently, file systems have introduced the concept of journaling. The journaling process records action in a separate journal, which can then be replayed safely if a system crash occurs. This process introduces unnecessary overhead, because the data needs to be written twice, and often results in a new set of problems, such as when the journal can't be replayed properly.

With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. This mechanism means that the file system can never be corrupted through accidental loss of power or a system crash. So, no need for a fsck equivalent exists. While the most recently written pieces of data might be lost, the file system itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost.

In short words ,ZFS Transactional Semantics is intend to keep your data intact , safe and consistent state.

Hi,

Thanks for the reply. But i need to know what is the exact script behind this scenario making it work like this?

The uberblock write is what you call the "exact script behind this scenario".

Thanks

ZFS writes all new changes to disk. In the very last step, ZFS makes the uber block to point to the new changes, which only then makes all new changes active. Until write is actively happening, uber block points to the old data. When writes are finished, only then uber block is repointed.

If power is cut during write, then uber block still points to the old valid data. This means writes are "all-or-nothing". It can not happen that half of the writes are written, and the other half gets not written because of power failure. Either everything is written down, or nothing happened - this is called "transactional writes" (or something similar wording).

The last step of moving the uber block, is what determines if new data will be valid, or old data will be valid.