Saturday, July 2, 2011

Committing a journal transaction in jbd

Journal's transaction commit consists of 8 phases, with the journal's state transitions mentioned as below in each of the phase.

The main function which does the journal commit is journal_commit_transaction(). When we decide to commit the transaction, journal is in running state. (T_RUNNING)

Lock the transaction for new updates. ===> T_LOCKED
---> Wait for any existing handles in the transaction to complete the updates.
---> Discard buffers from reserved list. (t_reserved_list).

If any buffer is part of next transaction, it is transferred to appropriate list of next transanction, otherwise dropped from journal's list.
---> Drop write-back buffers from checkpoint list.(t_checkpoint_list). Unless the buffers belong to the running or commiting transaction, the corresponding transaction will also be freed up.

Phase 1 start

---> Change transaction state to T_FLUSH
---> Switch the revoke tables.
---> At this point there is no running transaction, it is changed to a commiting

Phase 2 start

/* Flushing starts now */
---> Data buffers are flushed first. (t_sync_datalist)
---> Write out revoke records from the revoke hash list and flush to the descriptor blocks in journal.
---> Change transaction state to T_COMMIT

Phase 3 start

---> Flush metadata buffers (present on t_buffers list). See journal_write_metadata_buffer()

Phase 4 start

---> Wait for all the IO submitted buffers above. Wait for metadata buffers which are present on t_iobuf_list. The dummy buffer heads created for metadata buffers are released. The original metadata buffer which was put on shadow list is released, but put into t_forget list.

Phase 5 start

---> Wait for the submitted revoke record and descriptor buffers to complete and written out. This is done by waiting for buffers on t_log_list.

Phase 6 start

---> Change transaction state to T_COMMIT_RECORD
---> IO for data is complete now. Write the commit record in journal.
Phase 7 start

 ---> Walk the journal's t_forget list to get rid of buffers till there are no more buffers on it. As each buffer is examined, we check if it was on the checkpoint io list of previous transaction. If it is, its removed and if required (in case its dirty) its transferred to the checkpoint list of the committing transaction. See __journal_insert_checkpoint()

Phase 8 start

--> We are done committing the transaction now.
---> Change transaction state to T_FINISHED
---> Set committing transaction = NULL.
---> Calculate average commit time for future use.
---> Setup the checkpointing transaction.

1 comment:

Ming said...

Recently I am reading the ext3's source code. I am focusing on the "ordered" model. I found two places for the kernel to do "submit_io" to the block device layer. One is in the kthread's "journal_submit_commit_record" --> "journal_submit_data_buffers"; the other one is "ext3_ordered_writepage" --> "block_write_full_page". These two path are executed by different kernel thread. But when is the data actually written to the disk?