summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)Author
2014-04-07Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - the rest of MM - zram updates - zswap updates - exit - procfs - exec - wait - crash dump - lib/idr - rapidio - adfs, affs, bfs, ufs - cris - Kconfig things - initramfs - small amount of IPC material - percpu enhancements - early ioremap support - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits) MAINTAINERS: update Intel C600 SAS driver maintainers fs/ufs: remove unused ufs_super_block_third pointer fs/ufs: remove unused ufs_super_block_second pointer fs/ufs: remove unused ufs_super_block_first pointer fs/ufs/super.c: add __init to init_inodecache() doc/kernel-parameters.txt: add early_ioremap_debug arm64: add early_ioremap support arm64: initialize pgprot info earlier in boot x86: use generic early_ioremap mm: create generic early_ioremap() support x86/mm: sparse warning fix for early_memremap lglock: map to spinlock when !CONFIG_SMP percpu: add preemption checks to __this_cpu ops vmstat: use raw_cpu_ops to avoid false positives on preemption checks slub: use raw_cpu_inc for incrementing statistics net: replace __this_cpu_inc in route.c with raw_cpu_inc modules: use raw_cpu_write for initialization of per cpu refcount. mm: use raw_cpu ops for determining current NUMA node percpu: add raw_cpu_ops slub: fix leak of 'name' in sysfs_slab_add ...
2014-04-07mm: implement ->map_pages for page cacheKirill A. Shutemov
filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07Merge tag 'for-f2fs-3.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce large directory support - introduce f2fs_issue_flush to merge redundant flush commands - merge write IOs as much as possible aligned to the segment - add sysfs entries to tune the f2fs configuration - use radix_tree for the free_nid_list to reduce in-memory operations - remove costly bit operations in f2fs_find_entry - enhance the readahead flow for CP/NAT/SIT/SSA blocks The other bug fixes are as follows: - recover xattr node blocks correctly after sudden-power-cut - fix to calculate the maximum number of node ids - enhance to handle many error cases And, there are a bunch of cleanups" * tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits) f2fs: fix wrong statistics of inline data f2fs: check the acl's validity before setting f2fs: introduce f2fs_issue_flush to avoid redundant flush issue f2fs: fix to cover io->bio with io_rwsem f2fs: fix error path when fail to read inline data f2fs: use list_for_each_entry{_safe} for simplyfying code f2fs: avoid free slab cache under spinlock f2fs: avoid unneeded lookup when xattr name length is too long f2fs: avoid unnecessary bio submit when wait page writeback f2fs: return -EIO when node id is not matched f2fs: avoid RECLAIM_FS-ON-W warning f2fs: skip unnecessary node writes during fsync f2fs: introduce fi->i_sem to protect fi's info f2fs: change reclaim rate in percentage f2fs: add missing documentation for dir_level f2fs: remove unnecessary threshold f2fs: throttle the memory footprint with a sysfs entry f2fs: avoid to drop nat entries due to the negative nr_shrink f2fs: call f2fs_wait_on_page_writeback instead of native function f2fs: introduce nr_pages_to_write for segment alignment ...
2014-04-07f2fs: fix wrong statistics of inline dataChao Yu
If we remove a file that has inline data after mount, our statistics turns to inaccurate. cat /sys/kernel/debug/f2fs/status - Inline_data Inode: 4294967295 Let's add stat_inc_inline_inode() to stat inline info of the file when lookup. Change log from v1: o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07f2fs: check the acl's validity before settingZhangZhen
Before setting the acl, call posix_acl_valid() to check if it is valid or not. Signed-off-by: zhangzhen <zhenzhang.zhang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07f2fs: introduce f2fs_issue_flush to avoid redundant flush issueJaegeuk Kim
Some storage devices show relatively high latencies to complete cache_flush commands, even though their normal IO speed is prettry much high. In such the case, it needs to merge cache_flush commands as much as possible to avoid issuing them redundantly. So, this patch introduces a mount option, "-o flush_merge", to mitigate such the overhead. If this option is enabled by user, F2FS merges the cache_flush commands and then issues just one cache_flush on behalf of them. Once the single command is finished, F2FS sends a completion signal to all the pending threads. Note that, this option can be used under a workload consisting of very intensive concurrent fsync calls, while the storage handles cache_flush commands slowly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-04Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...
2014-04-03mm + fs: store shadow entries in page cacheJohannes Weiner
Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-02f2fs: fix to cover io->bio with io_rwsemJaegeuk Kim
In the f2fs_wait_on_page_writeback, io->bio should be covered by io_rwsem. Otherwise, the bio pointer can become a dangling pointer due to data races. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02f2fs: fix error path when fail to read inline dataChao Yu
We should unlock page in ->readpage() path and also should unlock & release page in error path of ->write_begin() to avoid deadlock or memory leak. So let's add release code to fix the problem when we fail to read inline data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02f2fs: use list_for_each_entry{_safe} for simplyfying codeChao Yu
This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for simplfying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-02f2fs: avoid free slab cache under spinlockChao Yu
Move kmem_cache_free out of spinlock protection region for better performance. Change log from v1: o remove spinlock protection for kmem_cache_free in destroy_node_manager suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01f2fs: avoid unneeded lookup when xattr name length is too longChao Yu
In f2fs_setxattr we have limit this attribute name length, so we should also check it in f2fs_getxattr to avoid useless lookup caused by invalid name length. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01f2fs: avoid unnecessary bio submit when wait page writebackChao Yu
This patch introduce is_merged_page() to check whether current page is merged in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache, resulting in having more chance to merge pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-01f2fs: return -EIO when node id is not matchedJaegeuk Kim
During the cleaing of node segments, F2FS can get errored node blocks due to data race between node page lock and its valid bitmap operations. In that case, it needs to return an error to skip such the obsolete block copy. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: avoid RECLAIM_FS-ON-W warningJaegeuk Kim
This patch should resolve the following possible bug. RECLAIM_FS-ON-W at: mark_held_locks+0xb9/0x140 lockdep_trace_alloc+0x85/0xf0 __kmalloc+0x53/0x1d0 read_all_xattrs+0x3d1/0x3f0 [f2fs] f2fs_getxattr+0x4f/0x100 [f2fs] f2fs_get_acl+0x4c/0x290 [f2fs] get_acl+0x4f/0x80 posix_acl_create+0x72/0x180 f2fs_init_acl+0x29/0xcc [f2fs] __f2fs_add_link+0x259/0x710 [f2fs] f2fs_create+0xad/0x1c0 [f2fs] vfs_create+0xed/0x150 do_last+0xd36/0xed0 path_openat+0xc5/0x680 do_filp_open+0x43/0xa0 do_sys_open+0x13c/0x230 SyS_creat+0x1e/0x20 system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: skip unnecessary node writes during fsyncJaegeuk Kim
If multiple redundant fsync calls are triggered, we don't need to write its node pages with fsync mark continuously. So, this patch adds FI_NEED_FSYNC to track whether the latest node block is written with the fsync mark or not. If the mark was set, a new fsync doesn't need to write a node block. Otherwise, we should do a new node block with the mark for roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: introduce fi->i_sem to protect fi's infoJaegeuk Kim
This patch introduces fi->i_sem to protect fi's info that includes xattr_ver, pino, i_nlink. This enables to remove i_mutex during f2fs_sync_file, resulting in performance improvement when a number of fsync calls are triggered from many concurrent threads. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: change reclaim rate in percentageJaegeuk Kim
It is more reasonable to determine the reclaiming rate of prefree segments according to the volume size, which is set to 5% by default. For example, if the volume is 128GB, the prefree segments are reclaimed when the number reaches to 6.4GB. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: remove unnecessary thresholdJaegeuk Kim
The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis of the memory footprint. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: throttle the memory footprint with a sysfs entryJaegeuk Kim
This patch introduces ram_thresh, a sysfs entry, which controls the memory footprint used by the free nid list and the nat cache. Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat cache was managed by NM_WOUT_THRESHOLD. However, this approach cannot be applied dynamically according to the system. So, this patch adds ram_thresh that users can specify the threshold, which is in order of 1 / 1024. For example, if the total ram size is 4GB and the value is set to 10 by default, f2fs tries to control the number of free nids and nat caches not to consume over 10 * (4GB / 1024) = 10MB. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: avoid to drop nat entries due to the negative nr_shrinkJaegeuk Kim
The try_to_free_nats should not receive the negative nr_shrink. Otherwise, it can drop all the nat entries by the while loop. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20f2fs: call f2fs_wait_on_page_writeback instead of native functionJaegeuk Kim
If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: introduce nr_pages_to_write for segment alignmentJaegeuk Kim
This patch introduces nr_pages_to_write to align page writes to the segment or other operational unit size, which can be tuned according to the system environment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: increase pages_skipped when skipping writepagesJaegeuk Kim
This patch increases pages_skipped when skipping writepages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: avoid small data writes by skipping writepagesJaegeuk Kim
This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can be skipped. The dentry, node, and meta pages can be conrolled by F2FS without breaking the FS consistency. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: introduce get_dirty_dents for readabilityJaegeuk Kim
The get_dirty_dents gives us the number of dirty dentry pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: fix incorrect parsing with option stringChao Yu
Previously 'background_gc={on***,off***}' is being parsed as correct option, with this patch we cloud fix the trivial bug in mount process. Change log from v1: o need to check length of parameter suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: avoid to return incorrect errno of read_normal_summariesChao Yu
We should return error number of read_normal_summaries instead of -EINVAL when read_normal_summaries failed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: introduce f2fs_has_xattr_block for better readabilityChao Yu
This patch introduces a help function f2fs_has_xattr_block for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-18f2fs: print type for each segment in segment_info's showChao Yu
The original segment_info's show looks out-of-format: cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 512 512 512 512 512 512 512 512 0 0 512 348 0 263 0 0 512 0 0 512 512 512 512 0 512 512 512 512 512 512 512 512 512 511 328 512 512 512 512 512 512 512 512 512 512 512 512 512 0 0 175 Let's fix this and show type for each segment. cat /proc/fs/f2fs/loop0/segment_info format: segment_type|valid_blocks segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN) 0 2|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 10 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 20 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 30 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 40 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 50 3|0 3|0 3|0 3|0 3|0 3|0 3|0 0|0 3|0 3|0 60 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|512 70 3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0 3|0 3|512 80 3|0 3|0 3|0 3|0 3|0 3|512 3|0 3|0 3|512 3|512 90 3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512 100 3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512 110 0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0 4|0 3|512 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-13fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o
Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
2014-03-12f2fs: check upper bound of ino value in f2fs_nfs_get_inodeChao Yu
Upper bound checking of ino should be added to f2fs_nfs_get_inode, so unneeded process before do_read_inode in f2fs_iget could be avoided when ino is invalid. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-12f2fs: introduce f2fs_has_inline_xattr for better readabilityChao Yu
This patch introduces a help function f2fs_has_inline_xattr for better readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-11f2fs: recover inline xattr data in roll-forward processChao Yu
Previously we do not recover inline xattr data of inode after power-cut, so inline xattr data may be lost. We should recover the data during the roll-forward process. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10f2fs: optimize restore_node_summary slightlyGu Zheng
Previously, we ra_sum_pages to pre-read contiguous pages as more as possible, and if we fail to alloc more pages, an ENOMEM error will be reported upstream, even though we have alloced some pages yet. In fact, we can use the available pages to do the job partly, and continue the rest in the following circle. Only reporting ENOMEM upstream if we really can not alloc any available page. And another fix is ignoring dealing with the following pages if an EIO occurs when reading page from page_list. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: modify the flow for better neat code] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10f2fs: format segment_info's show for better legibilityGu Zheng
The original segment_info's show is a bit out-of-format: [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 [root@guz Demoes]# so we fix it here for better legibility. [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 [root@guz Demoes]# Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()Gu Zheng
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-10f2fs: update start nid only once each circleGu Zheng
Integrated a couple of minor changes for better readability suggested by Chao Yu. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-05f2fs: fix wrong kernel coding styleJaegeuk Kim
This patch includes a simple fix to adjust coding style. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-03f2fs: fix to write node pages with WRITE_SYNCJaegeuk Kim
This patch fixes performance regression of dbench reported by Alex <hbx7d@yandex.com>. This issue was revealed by Phoronix tests results: http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2 It turns out that we need to assign WRITE_SYNC to the node writes, if fsync is triggered. The performance numbers are like below, which is measured by Alex. 1. 355MB/s ext4 2. 225MB/s f2fs : WRITE for node writes 3. 525MB/s f2fs : WRITE_SYNC for node writes Reported-And-Tested-by: Alex <hbx7d@yandex.com>. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-28f2fs: fix dirty page accounting when redirtyChao Yu
We should de-account dirty counters for page when redirty in ->writepage(). Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75': "writeback: fix dirtied pages accounting on redirty De-account the accumulative dirty counters on page redirty. Page redirties (very common in ext4) will introduce mismatch between counters (a) and (b) a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied b) NR_WRITTEN, BDI_WRITTEN This will introduce systematic errors in balanced_rate and result in dirty page position errors (ie. the dirty pages are no longer balanced around the global/bdi setpoints)." Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27f2fs: use existing macro to clean up some codesChao Yu
This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27f2fs: readahead contiguous SSA blocks for f2fs_gcChao Yu
If there are multi segments in one section, we will read those SSA blocks which have contiguous address one by one in f2fs_gc. It may lost performance, let's read ahead SSA blocks by merge multi read request. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27f2fs: add an sysfs entry to control the directory levelJaegeuk Kim
This patch adds an sysfs entry to control dir_level used by the large directory. The description of this entry is: dir_level This parameter controls the directory level to support large directory. If a directory has a number of files, it can reduce the file lookup latency by increasing this dir_level value. Otherwise, it needs to decrease this value to reduce the space overhead. The default value is 0. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27f2fs: introduce large directory supportJaegeuk Kim
This patch introduces an i_dir_level field to support large directory. Previously, f2fs maintains multi-level hash tables to find a dentry quickly from a bunch of chiild dentries in a directory, and the hash tables consist of the following tree structure as below. In Documentation/filesystems/f2fs.txt, ---------------------- A : bucket B : block N : MAX_DIR_HASH_DEPTH ---------------------- level #0 | A(2B) | level #1 | A(2B) - A(2B) | level #2 | A(2B) - A(2B) - A(2B) - A(2B) . | . . . . level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . | . . . . level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) But, if we can guess that a directory will handle a number of child files, we don't need to traverse the tree from level #0 to #N all the time. Since the lower level tables contain relatively small number of dentries, the miss ratio of the target dentry is likely to be high. In order to avoid that, we can configure the hash tables sparsely from level #0 like this. level #0 | A(2B) - A(2B) - A(2B) - A(2B) level #1 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . | . . . . level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) . | . . . . level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) With this structure, we can skip the ineffective tree searches in lower level hash tables. This patch adds just a facility for this by introducing i_dir_level in f2fs_inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27f2fs: remove costly bit operations for f2fs_find_entryJaegeuk Kim
It turns out that a bit operation like find_next_bit is not always fast enough for f2fs_find_entry. Instead, it is pretty much simple and fast to traverse each dentries. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24f2fs: implement a lock-free stat_showJaegeuk Kim
The stat_show is just to show the current status of f2fs. So, we can remove all the there-in locks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24f2fs: introduce a radix_tree for the free_nid listJaegeuk Kim
This patch introduces a radix tree for the list of free_nids, which enhances the performance on free nid management. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-24f2fs: introduce help macro on_build_free_nids()Gu Zheng
Introduce help macro on_build_free_nids() which just uses build_lock to judge whether the building free nid is going, so that we can remove the on_build_free_nids field from f2fs_sb_info. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: remove an unnecessary white line removal] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>