Close Search Box
Search Box

Search: From:

Close
Newsletter

9Tutorials to your Inbox



The Linux Virtual File System

The Linux Virtual File System
Author lv1 (3200/5000)
4,733 views
1 Star2 Star3Star4 Star5 Star (5 votes, average: 4.4 out of 5)

“Everything is a file in Linux� is an oft repeated statement. What this basically implies is that all the devices (Hard disks,CD ROMs, Floppy Disks, USB Sticks,etc) are all treated as files.

Just to explain this a bit more ,let us consider the C function write(f,&buf,len) . You must have used this function to write len bytes of data contained in buf to a file whose File Descriptor is given by f. Linux allows you to use the same function to write to a floppy (in the simplest of cases) as well. This is because everything capable of input and output is treated as a file in Linux. This is one of the two major abstractions in Linux, the other being a Process.

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk. The word is also used to refer to a partition or disk that is used to store the files or the type of the filesystem. Thus, one might say I have two filesystems meaning one has two partitions on which one stores files, or that one is using the extended filesystem, meaning the type of the filesystem.

Linux Filesystems

One of the most important features of Linux is its support for many different file systems. This makes it very flexible and well able to coexist with many other operating systems. ext2fs, ext3fs, ReiserFS, vfat, minix are among the major supported file systems. You can have have multiple filesystems on each partition of your hard disk and make them co-ordinate beautifully. For example you could install the Linux system in your ext3 partition and store all your data in a vfat partition.

Now to support the multiple filesystems relevant support should be built into the Linux kernel to recognize them. This again implies that the we must be having different mechanisms to write to a hard disk file and to a floppy disk file as they are having different filesystems. However we just mentioned in the beginning that we can use the same function to achieve the objective. Does this mean that we are having different implementations of the function for each filesystem? Fortunately, No. This would otherwise make the kernel bigger and decrease the extensibility of the kernel to support newer file systems. There is only a single implementation of the function and hence the same function may be used to write data to files stored on different filesystems. This is possible by the use of The Vitual Filesystem.

The Virtual Filesystem

The Virtual Filesystem henceforth referred to as VFS, is the subsytem of the Linux kernel which implementats the filesystem related interfaces provided to user space programs. The interoperability of the filesystems is made possible by the VFS.

File

Figure 1
As is apparent from the figure (Figure 1) , the requests from the user space application layer programs are received by the VFS which then interacts with the low level filesystems such as ext2,ext3,etc. Thus the VFS provides a layer of abstraction around the low-level filesystem interface. This is possible because the VFS provides a common file model that is capable of representing any conceivable filesystem’s general features and behavior.

Implementation of the VFS

The VFS is object-oriented. Well this might actually surprise you as it did me. The Linux kernel is programmed in C, so how come the VFS is object-oriented? This does not mean the VFS is coded using classes , objects and other OOPS features. VFS is object oriented in the sense that the whole implementation is built around the concept of objects. Since C does not support OOPS constructs such as classes therefore the data structures which are used to implemented object oriented C code are represented as C structures.
The four primary object types of the VFS are:

  • The superblock object which represents a specific mounted filesystem
  • The inode object which represents a specific file
  • The dentry object which represents a directory entry, a single component of a path
  • The file object which represents an open file as associated with a process

An operations object is contained within each of this primary objects and they define the operations that the kernel invokes on each of these objects. Specifically these are:

  • The super_operations object is a structure which is used by the kernel to read and write inodes, write superblock information back to disk and collect file system statistics as required by the system calls like fstatfs and statfs
  • The inode_operations object contains methods which the kernel can invoke on a specific file such as create( ) and link( )
  • The dentry_operations object contains the methods that the kernel can invoke on a specific directory entry, such as d_compare( ) and d_delete( )
  • The file object contains the operations that a process can invoke an open file such as read( ) and write( )

The Superblock Object and Superblock Operations

The superblock object represents a specific mounted file system. It usually corresponds to the file system superblock or file system control block on disk. A superblock is created and initialized when filesystem is mounted.

The superblock object is represented by struct super_block and defined in

. The important fields are reproduced here (Listing 1)Listing 1
——————————————–CODE—————————————-
struct super_block{
…
unsigned long s_blocksize; /* blocksize in bytes */
unsigned long long s_maxbytes; /* max file size */
struct file_system_type s_type; /* file system type */
struct super_operations s_op; /* superblock methods */
struct dentry *s_root; /* directory mount point */
struct list_head *s_dirty; /* list of dirty inodes */
struct list_head *s_files; /* list of assigned files */
void *s_fs_info; /* filesystem-specific info */
…
};——————————————–CODE—————————————-

As you can see above, there is a field struct super_operations s_op. This field defines a interface to the struct super_operations defined in < linux / fs.h >. This structure represents the superblock operations table. Listing 2 reproduces the important fields of the table:

Listing 2
——————————————–CODE—————————————-
struct super_block{
…..
struct inode *(*alloc_inode)(struct super_block sb); /* Create and initialize a new inode object */
void (*read_inode) (struct inode inode); /* Read a inode from disk */
void (*dirty_inode) (struct inode inode); /*Invoked by the VFS when the inode is dirtied / modified*/
void (*write_inode) (struct inode inode, int wait); /* write the given inode to disk */
int (*sync_fs) (struct super_block sb, int wait); /* Synchronizes filesystem metadata with ondisk meta data */
int (*statfs) (struct super_block sb, struct statfs statfs); /* statfs( ) system call */
int (*remount_fs) (struct super_block sb, int flags, char data); / * called by VFS when filesystem is remounted*/
…
};
——————————————–CODE—————————————-
Each item in the above structure is a pointer to a function that operates on a superblock object.

The Inode Object and Inode Operations

The inode object represents all the information needed by the kernel to manipulate a file or directory. Inode stores the metadata about the files and directories on the disk.
It is represented by struct inode defined in < linux / fs.h >. The important fields are reproduced below.(Listing 3)
Listing 3
——————————————–CODE—————————————-
struct inode{
…
unsigned long i_ino; /* inode number */
umode_t i_mode; /* access permissions */
unsigned int i_nlink; /* number of hard links */
uid_t i_uid; /* user id of owner */
loff_t i_size; /* file size in bytes */
struct inode_operations *i_op; /* inode operations table */
struct file_operations *f_op; /* file operations */
struct super_block *i_sb; /* associated superblock */
union{ void *generic_ip;} u; /* fs-specific info */
…
struct list_head i_devices; /* list of block Devices */
struct pipe_inode_info i_pipe; / * pipe information */
struct block_device i_bdev; / * Block Device Driver */
struct cdev i_cdev; /* Character Devcie Driver*/

};
——————————————–CODE—————————————-
An inode represents information about each file on a filesystem and are created in memory as the files are accessed. Also it is worthwhile to note in the above listing the last few lines. Recall that even devices and pipes are represented as files and so they must be having the corresponding inode entries. This is what is achieved by the following entries : (Table 1)

Table 1

Inode entry

Special Files

struct pipe_inode_info *i_pipe;

Pipe

struct block_device *i_bdev;

Block device driver

struct cdev *i_cdev;

Character device driver

The inode operations are defined in struct inode_operations in < linux/fs.h >. The common fields are reproduced below. (Listing 5)
Listing 4
——————————————–CODE—————————————-
struct inode_operations{
…
int (*create) (struct inode dir, struct dentry dentry, int mode); /* Create a new inode corresponding to a creat( ) or open ( ) system call */
struct dentry (*lookup) (struct inode dir, struct dentry dentry); /* Search for an inode */
int (*link) (struct dentry old_dentry, struct inode dir, struct dentry dentry); /* called by the link ( ) system call*/
int (*unlink) (struct inode dir, struct dentry dentry); /* called from unlink( ) system call */
int (*mkdir) (struct inode ,struct dentry ,int); /* called by the mkdir( ) system call */
int (*rmdir) (struct inode *,struct dentry *); /* called by the rmdir( ) system call */
int (*mknod) (struct inode *,struct dentry *,int,dev_t); /* called by the mknod( ) system call */
…
};
——————————————–CODE—————————————-

The Dentry Object and Dentry Operations

A dentry object represents a specific component in a path and usually corresponds to some inode. It is created on-the-fly from a string representation of a path name and does not correspond to any on-disk data structure and is created/used in directory-specific operations such as path name lookup
Listing 5
——————————————–CODE—————————————-
struct dentry{
…
struct inode *d_inode; /* associated inode */
struct list_head d_subdirs; /* subdirectories */
struct dentry_operations *d_op; /* dentry operations table */
struct super_block *d_sb; /*superblock of file */
void *d_fsdata; /* fs-specific data */
struct dentry *d_parent; /* dentry of parent dir */
struct qstr d_name; /* dentry name */
…
};

——————————————–CODE—————————————-
The dentry operations are declared in struct dentry_operations( Listing 6)

Listing 6
——————————————–CODE—————————————-
struct dentry_operations{
…
int (*d_revalidate) (struct dentry *, int);
int (*d_hash) (struct dentry *, struct qstr *);
int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
…
};

——————————————–CODE—————————————-

The File object and File operations

Finally, we reach the highest level of abstraction, the file. Every file that is opened by a process has a corresponding entry of the file object. It is defined as struct file in <linux/fs.h> .The important fields are reproduced here. (Listing 6)
Listing 7
——————————————–CODE—————————————-
struct file{
…
struct dentry *f_dentry; /* associated dentry object */
struct file_operations *f_op; /* file operations table */
unsigned int f_flags; /* flags specified on open */
loff_t f_pos; /* file offset(file pointer)*/
void *private_data;
…
};
——————————————–CODE—————————————-
When a file is opened using the open( ) system call a new file object is created and destroyed when close ( ) is called. These operations are specified in struct file_operations in < linux /fs.h > (Listing 7)
Listing 8
——————————————–CODE—————————————-
struct file_operations{
…
loff_t (*llseek) (struct file , loff_t int); / * llseek( ) system call */
ssize_t (*read) (struct file *, char *, size_t *, loff_t *); /* read( ) system call */
ssize_t (*write) (struct file *, const char *, size_t, loff_t *); /* write( ) system call */
int (*open) (struct inode *, struct file *); /* open( ) system call */
int (*readdir) (struct file , void dirent, filldir_t); / * readdir() system call */
…
};
——————————————–CODE—————————————-
As is apparent , the most common file operation system calls are defined here.
After this short tour of the VFS let us now seek an understanding how a filesystem is mounted

Mounting a filesystem

Note : Portions of this section has been cited from the doc available at http://www.tldp.org/LDP/tlk/fs/filesystem.html as the licence allows this
When the user attempts to mount a file system, the Linux kernel must first validate the arguments passed in the system call. Although mount does some basic checking, it does not know which file systems this kernel has been built to support or that the proposed mount point actually exists. Consider the following mount command:
$ mount -t vfat /dev/hda5 /mnt/winc
Once this command is given, the first task of the VFS is to find the type of filesystem to be mounted (vfat in this example). The VFS achieves this by browsing through the list of known file systems by looking at each file_system_type (see below) data structure in the list of supported filesystems . If a match is found it knows that this file system type is supported by this kernel and it has the address of the file system specific routine for reading this file system’s superblock. If it cannot find a matching file system name, a request is passed to the kernel daemon to load the appropriate module. (This is possible only if the kernel supports module loading).
The next task is to find the VFS inode of the directory (/mnt/winc) which is to be the new file system’s mount point. Once the inode has been found (either in the inode cache or a block device) it is checked to see that it is a directory and that there is not already some other file system mounted there.
At this point a VFS superblock must be allocated and pass the mount information to the superblock read routine for this file system. All of the system’s VFS superblocks are kept in the super_blocks vector of super_block data structures and one must be allocated for this mount. The superblock read routine must fill out the VFS superblock fields based on information that it reads from the physical device. Note that not all the fields of the superblock are used by all filesystems, some filesystems may just leave certain unused fields as NULL.
Each mounted file system is described by a vfsmount data structure( defined as struct vfsmount in linux/mount.h, Listing 9). These are queued on a list pointed at by vfsmntlist (defined in < fs/super.c >).

Listing 9
——————————————–CODE—————————————-
struct vfsmount {
struct list_head mnt_hash;
struct vfsmount *mnt_parent; /* fs we are mounted on */
struct dentry *mnt_mountpoint; /* dentry of mountpoint */
struct dentry *mnt_root; /* root of the mounted tree */
struct super_block *mnt_sb; /* pointer to superblock */
struct list_head mnt_mounts; /* list of children, anchored here */
struct list_head mnt_child; /* and going through their mnt_child */
atomic_t mnt_count;
int mnt_flags;
int mnt_expiry_mark; /* true if marked for expiry */
char *mnt_devname; /* Name of device e.g. /dev/dsk/hda1 */
struct list_head mnt_list;
struct list_head mnt_expire; /* link in fs-specific expiry list */
struct list_head mnt_share; /* circular list of shared mounts */
struct list_head mnt_slave_list; /* list of slave mounts */
struct list_head mnt_slave; /* slave list entry */
struct vfsmount *mnt_master; /* slave is on master->mnt_slave_list */
struct namespace *mnt_namespace; /* containing namespace */
int mnt_pinned;
};

——————————————–CODE—————————————-
Another pointer, vfsmnttail ( fs/super,c>) points at the last entry in the list and the mru_vfsmnt (<fs/super.c >) pointer points at the most recently used file system. Each vfsmount structure contains the device number of the block device holding the file system, the directory where this file system is mounted and a pointer to the VFS superblock allocated when this file system was mounted. In turn the VFS superblock points at the file_system_type (See Below) data structure for this sort of file system and to the root inode for this file system. This inode is kept resident in the VFS inode cache all of the time that this file system is loaded.

The file_system_type structure

Each filesystem is represented by a struct file_system_type structure defined in <linux /fs.h> (Listing 10)
Listing 10
——————————————–CODE—————————————-
struct file_system_type {
const char name; /* filesystem’s name */
int fs_flags; /* filesystem type flags */
struct super_block *(*get_sb) (struct file_system_type *, int, const char *, void *);
/* read the superblock off the disk*/
void (*kill_sb) (struct super_block *); /* terminate access to the superblock*/
struct module owner; /* module ownning the filesystem */
struct file_system_type * next; /* next filesystem in list*/
struct list_head fs_supers; /* list of superblock objects */
};

——————————————–CODE—————————————-

Footnotes

So that was a whirlwind tour of the Linux VFS wounding up with the internals of the mounting of filesystems. Implementation of a filesystem and how data lands up in the disks are unexplored territories which we will explore later. Till then happy hacking!
All code samples in this article are as in Linux kernel 2.6.15.4
References
• Linux Kernel Development, Robert Love
• Linux Kernel Source Code

Copyright @ Amit Kumar Saha

The author is a 3rd year Computer engineering undergraduate student at Haldia Institute of Technology, Haldia, India. His interests include Operating systems, Network protocols, Network security, Mobile computing.He is passionate about programming and Python is the most recent addition to the kitty of languages
Homepage: http://amitsaha.in.googlepages.com

Attached Files:

del.icio.us:The Linux Virtual File System digg:The Linux Virtual File System spurl:The Linux Virtual File System newsvine:The Linux Virtual File System blinklist:The Linux Virtual File System furl:The Linux Virtual File System reddit:The Linux Virtual File System blogmarks:The Linux Virtual File System Y!:The Linux Virtual File System magnolia:The Linux Virtual File System segnalo:The Linux Virtual File System

Post a Comment »








Safari hates me

Comment Guidelines

  • Hyperlinks are automatically generated.
  • <em>italic</em>
  • <strong>bold</strong>
  1. deepak June 25, 2007

    good notes

  2. Shivaji July 8, 2007

    Good Tutor….Thanks..

  3. betmen August 3, 2007

    nice tut’s u have there buddy …
    keep it up the good work

    ^_^

  4. sundar October 1, 2007

    Good Explaination….