Sunday, June 26, 2011

Journal recovery in jbd

Journal recovery :- jbd/recovery.c
------------------

   Journal recovery is quite simple. It basically consists of below steps.

a) Readahead journal blocks in memory.

b) Do first pass (PASS_SCAN) to see if we need a recovery. If yes what all transactions do we need to replay, if the journal is valid etc other sanity checks. After the first scan pass, an incore data structure about the journal (struct recovery_info) is populated which contains the required information about the recovery.

c) Do second pass (PASS_REVOKE). This traverses all the revoke block types and builds the incore hash of block numbers which are revoked. This ensures that we don't replay ops corresponding to these blocks when we do the actual replay.

d) Do the third/final pass (PASS_REPLAY) which actually does the job of replaying the journal and copies the data from journal to the real filesystem. Replaying a op simply consists of reading the corresponding block number from filesystem, copying the contents from journal to buffer and then marking the buffer dirty which would be written back to the actual location in filesystem.

NB: Steps (b), (c) and (d) are done through a common function do_one_pass().
e) Once the replay is complete, throw away the in-memory revoke hash.

f) Sync the blockdevice.

g) Once the recovery is done, journal_reset() is called to setup the inmemory fields of journal, and journal is ready for business again.

journal_recover()
|
--> do_one_pass(PASS_SCAN)
|
--> do_one_pass(PASS_REVOKE)
|
--> do_one_pass(PASS_REPLAY)
|
--> journal_clear_revoke()
|
--> sync_blockdev() and journal_reset()