Solaris -- ZFS

Aus RZ-Amper Wiki
Zur Navigation springen Zur Suche springen

ZFS ist ein von Sun Microsystems entwickeltes transaktionales Dateisystem, welches zahlreiche Erweiterungen für die Verwendung im Server- und Rechenzentrumsbereich enthält. Hierzu zählen die enorme maximale Dateisystemgröße, eine einfache Verwaltung selbst komplexer Konfigurationen, die integrierten RAID-Funktionalitäten, das Volume-Management sowie der prüfsummenbasierte Schutz vor Datenübertragungsfehlern. Der Name ZFS stand ursprünglich für Zettabyte File System, ist aber inzwischen ein Pseudo-Akronym, wodurch die Langform nicht mehr gebräuchlich ist.vgl. You say zeta, I say zetta

Vorlage:Infobox Dateisystem


Creating and manipulating zpools (zfs)

For pooling devices, zpools can be:

  • a mirror
  • a RAIDz with single or double parity
  • a concatenated/striped storage

First we will try to look up the disks accessible by our system:

 vidar/# format
 Searching for disks...done
 
 AVAILABLE DISK SELECTIONS:
        0. c1t0d0 <DEFAULT cyl 10440 alt 2 hd 255 sec 63>
           /pci@0,0/pci15ad,1976@10/sd@0,0
        1. c1t1d0 <DEFAULT cyl 10440 alt 2 hd 255 sec 63>
           /pci@0,0/pci15ad,1976@10/sd@1,0
        2. c2t0d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@0,0
        3. c2t1d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@1,0
        4. c2t2d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@2,0
        5. c2t3d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@3,0
        6. c2t4d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@4,0
        7. c2t5d0 <VMware,-VMware Virtual S-1.0-1.00GB>
           /pci@0,0/pci15ad,790@11/pci15ad,1976@2/sd@5,0
 Specify disk (enter its number): ^C

Type CTRL-C to quit "format".


If your disks do not show up, use 'devfsadm'.

Let's create our first pool by simply putting together all three disks (c1t0d0 is our root partition and c1t1d0 our '/var' directory which is not usable for our example)

 vidar/# zpool create iscsi1 raidz c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0


That's it. You have just created a zpool named "iscsi1" containing all three disks. Your available space will be just the sum of all six disks.

 vidar/# zpool list
 NAME     SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
 iscsi1  5.91G   167K  5.91G     0%  ONLINE  -


Use "zpool status" to get detailed status information of the components of your zpool.

 vidar/# zpool list
 NAME     SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
 iscsi1  5.91G   167K  5.91G     0%  ONLINE  -
 vidar/# zpool status
   pool: iscsi1
  state: ONLINE
  scrub: none requested
 config:
 
         NAME        STATE     READ WRITE CKSUM
         iscsi1      ONLINE       0     0     0
           raidz1-0  ONLINE       0     0     0
             c2t0d0  ONLINE       0     0     0
             c2t1d0  ONLINE       0     0     0
             c2t2d0  ONLINE       0     0     0
             c2t3d0  ONLINE       0     0     0
             c2t4d0  ONLINE       0     0     0
             c2t5d0  ONLINE       0     0     0
 
 errors: No known data errors


To destroy a pool, use "zpool destroy":

 vidar/# zpool destroy zfstest

Using zfs (basics)

This zpool "iscsi1" also has one incorporated zfs filesystem on it. To manipulate zfs there is the "zfs" command. So keep in mind: zpool manipulates pool storage, zfs manipulates zfs generation and options.

 vidar/# zfs list
 NAME     USED  AVAIL  REFER  MOUNTPOINT
 iscsi1   107K  4.83G  34.9K  /iscsi1


As you can see, the pool "iscsi1" also has a filesystem on it, mounted automatically at mountpoint /zfstest. You may create a new filesystem by using "zfs create".

 vidar/# zfs create iscsi1/affe
 vidar/# zfs list
 NAME          USED  AVAIL  REFER  MOUNTPOINT
 iscsi1        157K  4.83G  34.9K  /iscsi1
 iscsi1/affe  34.9K  4.83G  34.9K  /iscsi1/affe


New filesystems within a pool are always named "poolname/filesystemname". Without any additional options, it will also mount automatically on "/poolname/filesystemname".

 vidar/# zfs create iscsi1/elefant
 vidar/# zfs list
 NAME             USED  AVAIL  REFER  MOUNTPOINT
 iscsi1           201K  4.83G  36.5K  /iscsi1
 iscsi1/affe     34.9K  4.83G  34.9K  /iscsi1/affe
 iscsi1/elefant  34.9K  4.83G  34.9K  /iscsi1/elefant


We see some differences between old-fashioned filesystems and zfs: Usable storage is shared among all filesystems in a pool. "iscsi1/affe" has 4.83G available, "iscsi1/elefant" also, as does the master pool filesystem "iscsi1". So why create filesystems then? Couldn't we just use subdirectories in our master pool filesystem "iscsi1" (mounted on /iscsi1)? The "trick" about zfs filesystems is the possibility to assign options to them, so they can be treated differently. We will see that later. First, let's push some senseless data on our newly created filesystem.

 vidar/iscsi1/affe# mkfile 1g /iscsi1/affe/randomfile


This command creates a file "randomfile" in directory /iscsi1/affe, consisting of 1GB. That's big enough for our purpose. "zfs list" reads:

 vidar/# zfs list
 NAME             USED  AVAIL  REFER  MOUNTPOINT
 iscsi1          1023M  3.83G  38.2K  /iscsi1
 iscsi1/affe     1023M  3.83G  1023M  /iscsi1/affe
 iscsi1/elefant  34.9K  3.83G  34.9K  /iscsi1/elefant

1023 megabytes are used from filesystem iscsi1/affe, as expected. Notice also that now every other filesystem on that pool only can allocate 3.83G, as 1023M are taken (compare with 4.83G above, before creating that big file).

You CAN look up free space in your zfs filesystems also doing a "df -k", but I wouldn't recommend it: You won't see snapshots and the numbers can be very big.

 vidar/# df -h
 Filesystem             size   used  avail capacity  Mounted on
 /dev/dsk/c1t0d0s0       77G   6.6G    69G     9%    /
 /devices                 0K     0K     0K     0%    /devices
 ctfs                     0K     0K     0K     0%    /system/contract
 proc                     0K     0K     0K     0%    /proc
 mnttab                   0K     0K     0K     0%    /etc/mnttab
 swap                   2.0G   980K   2.0G     1%    /etc/svc/volatile
 objfs                    0K     0K     0K     0%    /system/object
 sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
 fd                       0K     0K     0K     0%    /dev/fd
 /dev/dsk/c1t1d0s7       79G   3.7G    74G     5%    /var
 swap                   2.0G     8K   2.0G     1%    /tmp
 swap                   2.0G    32K   2.0G     1%    /var/run
 /vol/dev/dsk/c0t0d0/sol_10_910_sparc
                        2.1G   2.1G     0K   100%    /cdrom/sol_10_910_sparc
 /hgfs                   16G   4.0M    16G     1%    /hgfs
 iscsi1                 4.8G    38K   3.8G     1%    /iscsi1
 iscsi1/affe            4.8G  1023M   3.8G    21%    /iscsi1/affe
 iscsi1/elefant         4.8G    35K   3.8G     1%    /iscsi1/elefant


So let's try out first option: "quota". As you can imagine, "quota" limits storage. You know that as nearly every mailbox provider do impose a quota on your storage, as do file space providers. First: To set and get options, you need to use "zfs set" and "zfs get", respectively. So here we define a quota on 'iscsi1/elefant'

 vidar/# zfs set quota=1G iscsi1/elefant
 vidar/# zfs list
 NAME             USED  AVAIL  REFER  MOUNTPOINT
 iscsi1          1023M  3.83G  38.2K  /iscsi1
 iscsi1/affe     1023M  3.83G  1023M  /iscsi1/affe
 iscsi1/elefant  34.9K  1024M  34.9K  /iscsi1/elefant

Only 1G left to use at mountpoint /iscsi1/elefant. Note, that you may still gobble up 3.83G in /iscsi1/affe, making it impossible then to put 1G in /iscsi1/elefant. So a quota does not guarantee any storage, it only limits it. To guarantee a certain amount of storage, use the option "reservation":

 vidar/# zfs set reservation=1G iscsi1/elefant
 vidar/# zfs list
 NAME             USED  AVAIL  REFER  MOUNTPOINT
 iscsi1          2.00G  2.83G  38.2K  /iscsi1
 iscsi1/affe     1023M  2.83G  1023M  /iscsi1/affe
 iscsi1/elefant  34.9K  1024M  34.9K  /iscsi1/elefant

Now we simulated a classical "partition" - we reserved the same amount of storage as the quota implies, 1G. The other filesystems only have 2.83G left, as 1 G are really reserved for iscsi1/elefant.

Now, let's try another nice option: compression Perhaps now you are thinking about compression nightmares on windows systems, like doublespace, stacker and all these other parasital programs which killed performance, not storage. Forget them! zfs compression IS reliable and - fast! With todays' CPU power the effect of compressing and decompressing objects is a charm and won't harm significantly your overall performance - it can boost performance as you will need less i/o due to compression. As with many other zfs options, changing the compression only affects newly written files/sectors. Uncompressed blocks still can be read. It's transparent to the application. fseek() et.al. do not even notice that files are compressed.

 vidar/# set compression=on iscsi1/elefant

Logical Volumes

A logical volume exported as a raw or block device. This type of dataset should only be used under special cir-cumstances. File systems are typically used in most environments. The volume is exported as a block device in /dev/zvol/{dsk,rdsk}/path, where path is the name of the volume in the ZFS namespace. The size represents the logical size as exported by the device. By default, a reservation of equal size is created. Size is automatically rounded up to the nearest 128 KB to ensure that the volume has an integral number of blocks regardless of blocksize.

 vidar/dev# zfs create -V 4G iscsi1/volume1
 vidar/dev# zfs list
 NAME             USED  AVAIL  REFER  MOUNTPOINT
 iscsi1          4.13G   716M  34.9K  /iscsi1
 iscsi1/volume1  4.13G  4.83G  26.6K  -

enable iSCSI

Enabling iSCSI on a zfs volume is pretty easy.

 vidar/# zfs shareiscsi=on iscsi/affe

If you set 'shareiscsi=on' to 'iscsi1' then all volumes beyond will be available as iSCSI targets.