Bad block HOWTO for smartmontools


Bad block HOWTO for smartmontools

Bruce Allen

      
     

Douglas Gilbert

      
     

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

For an online copy of the license see www.fsf.org/copyleft/fdl.html.

2007-01-23

Revision History
Revision 1.1 2007-01-23 dpg
add sections on ReiserFS and partition table damage
Revision 1.0 2006-11-14 dpg
merge BadBlockHowTo.txt and BadBlockSCSIHowTo.txt

Abstract

This article describes what actions might be taken when smartmontools detects a bad block on a disk. It demonstrates how to identify the file associated with an unreadable disk sector, and how to force that sector to reallocate.


Table of Contents

Introduction
Repairs in a file system
ext2/ext3 first example
ext2/ext3 second example
Unassigned sectors
ReiserFS example
Repairs at the disk level
Partition table problems
LVM repairs
Bad block reassignment

Introduction

Handling bad blocks is a difficult problem as it often involves decisions about losing information. Modern storage devices tend to handle the simple cases automatically, for example by writing a disk sector that was read with difficulty to another area on the media. Even though such a remapping can be done by a disk drive transparently, there is still a lingering worry about media deterioration and the disk running out of spare sectors to remap.

Can smartmontools help? As the SMART acronym [1] suggests, the smartctl command and the smartd daemon concentrate on monitoring and analysis. So apart from changing some reporting settings, smartmontools will not modify the raw data in a device. Also smartmontools only works with physical devices, it does not know about partitions and file systems. So other tools are needed. The job of smartmontools is to alert the user that something is wrong and user intervention may be required.

When a bad block is reported one approach is to work out the mapping between the logical block address used by a storage device and a file or some other component of a file system using that device. Note that there may not be such a mapping reflecting that a bad block has been found at a location not currently used by the file system. A user may want to do this analysis to localize and minimize the number of replacement files that are retrieved from some backup store. This approach requires knowledge of the file system involved and this document uses the Linux ext2/ext3 and ReiserFS file systems for examples. Also the type of content may come into play. For example if an area storing video has a corrupted sector, it may be easiest to accept that a frame or two might be corrupted and instruct the disk not to retry as that may have the visual effect of causing a momentary blank into a 1 second pause (while the disk retries the faulty sector, often accompanied by a telltale clicking sound).

Another approach is to ignore the upper level consequences (e.g. corrupting a file or worse damage to a file system) and use the facilities offered by a storage device to repair the damage. The SCSI disk command set is used elaborate on this low level approach.

Repairs in a file system

This section contains examples of what to do at the file system level when smartmontools reports a bad block. These examples assume the Linux operating system and either the ext2/ext3 or ReiserFS file system. The various Linux commands shown have man pages and the reader is encouraged to examine these. Of note is the dd command which is often used in repair work [2] and has a unique command line syntax.

The authors would like to thank Sergey Vlasov, Theodore Ts’o, Michael Bendzick, and others for explaining this approach. The authors would like to add text showing how to do this for other file systems, in particular XFS, and JFS: please email if you can provide this information.

ext2/ext3 first example

In this example, the disk is failing self-tests at Logical Block Address LBA = 0×016561e9 = 23421417. The LBA counts sectors in units of 512 bytes, and starts at zero.

root]# smartctl -l selftest /dev/hda:  SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed: read failure       90%       217         0x016561e9

Note that other signs that there is a bad sector on the disk can be found in the non-zero value of the Current Pending Sector count:

root]# smartctl -A /dev/hda ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       1 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       1

First Step: We need to locate the partition on which this sector of the disk lives:

root]# fdisk -lu /dev/hda  Disk /dev/hda: 123.5 GB, 123522416640 bytes 255 heads, 63 sectors/track, 15017 cylinders, total 241254720 sectors Units = sectors of 1 * 512 = 512 bytes     Device Boot    Start       End    Blocks   Id  System /dev/hda1   *        63   4209029   2104483+  83  Linux /dev/hda2       4209030   5269319    530145   82  Linux swap /dev/hda3       5269320 238227884 116479282+  83  Linux /dev/hda4     238227885 241248104   1510110   83  Linux

The partition /dev/hda3 starts at LBA 5269320 and extends past the ‘problem’ LBA. The ‘problem’ LBA is offset 23421417 - 5269320 = 18152097 sectors into the partition /dev/hda3.

To verify the type of the file system and the mount point, look in /etc/fstab:

root]# grep hda3 /etc/fstab /dev/hda3 /data ext2 defaults 1 2

You can see that this is an ext2 file system, mounted at /data.

Second Step: we need to find the block size of the file system (normally 4096 bytes for ext2):

root]# tune2fs -l /dev/hda3 | grep Block Block count:              29119820 Block size:               4096

In this case the block size is 4096 bytes. Third Step: we need to determine which File System Block contains this LBA. The formula is:

  b = (int)((L-S)*512/B) where: b = File System block number B = File system block size in bytes L = LBA of bad sector S = Starting sector of partition as shown by fdisk -lu and (int) denotes the integer part.

In our example, L=23421417, S=5269320, and B=4096. Hence the ‘problem’ LBA is in block number

   b = (int)18152097*512/4096 = (int)2269012.125 so b=2269012.

Note: the fractional part of 0.125 indicates that this problem LBA is actually the second of the eight sectors that make up this file system block.

Fourth Step: we use debugfs to locate the inode stored in this block, and the file that contains that inode:

root]# debugfs debugfs 1.32 (09-Nov-2002) debugfs:  open /dev/hda3 debugfs:  icheck 2269012 Block   Inode number 2269012 41032 debugfs:  ncheck 41032 Inode   Pathname 41032   /S1/R/H/714197568-714203359/H-R-714202192-16.gwf

In this example, you can see that the problematic file (with the mount point included in the path) is: /data/S1/R/H/714197568-714203359/H-R-714202192-16.gwf

To force the disk to reallocate this bad block we’ll write zeros to the bad block, and sync the disk:

root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=2269012 root]# sync

NOTE: This last step has permanently and irretrievably destroyed some of the data that was in this file. Don’t do this unless you don’t need the file or you can replace it with a fresh or correct version.

Now everything is back to normal: the sector has been reallocated. Compare the output just below to similar output near the top of this article:

root]# smartctl -A /dev/hda ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       1

Note: for some disks it may be necessary to update the SMART Attribute values by using smartctl -t offline /dev/hda

The disk now passes its self-tests again:

root]# smartctl -t long /dev/hda  [wait until test completes, then] root]# smartctl -l selftest /dev/hda  SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed without error       00%       239         - # 2  Extended offline    Completed: read failure       90%       217         0x016561e9 # 3  Extended offline    Completed: read failure       90%       212         0x016561e9 # 4  Extended offline    Completed: read failure       90%       181         0x016561e9 # 5  Extended offline    Completed without error       00%        14         - # 6  Extended offline    Completed without error       00%         4         -

and no longer shows any offline uncorrectable sectors:

root]# smartctl -A /dev/hda ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

ext2/ext3 second example

On this drive, the first sign of trouble was this email from smartd:

    To: ballen     Subject: SMART error (selftest) detected on host: medusa-slave166.medusa.phys.uwm.edu      This email was generated by the smartd daemon running on host:     medusa-slave166.medusa.phys.uwm.edu in the domain: master001-nis      The following warning/error was logged by the smartd daemon:     Device: /dev/hda, Self-Test Log error count increased from 0 to 1

Running smartctl -a /dev/hda confirmed the problem:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed: read failure       80%       682         0x021d9f44  Note that the failing LBA reported is 0x021d9f44 (base 16) = 35495748 (base 10)      ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       3 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       3

and one can see above that there are 3 sectors on the list of pending sectors that the disk can’t read but would like to reallocate.

The device also shows errors in the SMART error log:

Error 212 occurred at disk power-on lifetime: 690 hours   After command completion occurred, registers were:   ER ST SC SN CL CH DH   -- -- -- -- -- -- --   40 51 12 46 9f 1d e2  Error: UNC 18 sectors at LBA = 0x021d9f46 = 35495750    Commands leading to the command that caused the error were:   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name   -- -- -- -- -- -- -- --   ---------  --------------------   25 00 12 46 9f 1d e0 00 2485545.000  READ DMA EXT

Signs of trouble at this LBA may also be found in SYSLOG:

[root]# grep LBA /var/log/messages | awk '{print $12}' | sort | uniq  LBAsect=35495748  LBAsect=35495750

So I decide to do a quick check to see how many bad sectors there really are. Using the bash shell I check 70 sectors around the trouble area:

[root]# export i=35495730 [root]# while [ $i -lt 35495800 ]         > do echo $i         > dd if=/dev/hda of=/dev/null bs=512 count=1 skip=$i         > let i+=1         > done   <SNIP>     35495734 1+0 records in 1+0 records out 35495735 dd: reading `/dev/hda': Input/output error 0+0 records in 0+0 records out  <SNIP>  35495751 dd: reading `/dev/hda': Input/output error 0+0 records in 0+0 records out 35495752 1+0 records in 1+0 records out  <SNIP>

which shows that the seventeen sectors 35495735-35495751 (inclusive) are not readable.

Next, we identify the files at those locations. The partitioning information on this disk is identical to the first example above, and as in that case the problem sectors are on the third partition /dev/hda3. So we have:

     L=35495735 to 35495751      S=5269320      B=4096

so that b=3778301 to 3778303 are the three bad blocks in the file system.

[root]# debugfs debugfs 1.32 (09-Nov-2002) debugfs:  open /dev/hda3 debugfs:  icheck 3778301 Block   Inode number 3778301 45192 debugfs:  icheck 3778302 Block   Inode number 3778302 45192 debugfs:  icheck 3778303 Block   Inode number 3778303 45192 debugfs:  ncheck 45192 Inode   Pathname 45192   /S1/R/H/714979488-714985279/H-R-714979984-16.gwf debugfs:  quit

And finally, just to confirm that this is really the damaged file:

[root]# md5sum /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf md5sum: /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf: Input/output error

Finally we force the disk to reallocate the three bad blocks:

[root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=3 seek=3778301 [root]# sync

We could also probably use:

[root]# dd if=/dev/zero of=/dev/hda bs=512 count=17 seek=35495735

At this point we now have:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

which is encouraging, since the pending sectors count is now zero. Note that the drive reallocation count has not yet increased: the drive may now have confidence in these sectors and have decided not to reallocate them..

A device self test:

  [root#] smartctl -t long /dev/hda (then wait about an hour) shows no unreadable sectors or errors:  Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Extended offline    Completed without error       00%       692         - # 2  Extended offline    Completed: read failure       80%       682         0x021d9f44

Unassigned sectors

This section was written by Kay Diederichs. Even though this section assumes Linux and the ext2/ext3 file system, the strategy should be more generally applicable.

I read your badblocks-howto at and greatly benefited from it. One thing that’s (maybe) missing is that often the smartctl -t long scan finds a bad sector which is not assigned to any file. In that case it does not help to run debugfs, or rather debugfs reports the fact that no file owns that sector. Furthermore, it is somewhat laborious to come up with the correct numbers for debugfs, and debugfs is slow …

So what I suggest in the case of presence of Current_Pending_Sector/Offline_Uncorrectable errors is to create a huge file on that file system.

  dd if=/dev/zero of=/some/mount/point bs=4k

creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not belong to a file. Check the smartctl -a output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on.

ReiserFS example

This section was written by Joachim Jautz with additions from Manfred Schwarb.

The following problems were reported during a scheduled test:

smartd[575]: Device: /dev/hda, starting scheduled Offline Immediate Test. [... 1 hour later ...] smartd[575]: Device: /dev/hda, 1 Currently unreadable (pending) sectors smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors

[Step 0] The SMART selftest/error log (see smartctl -l selftest) indicated there was a problem with block address (i.e. the 512 byte sector at) 58656333. The partition table (e.g. see sfdisk -luS /dev/hda or fdisk -ul /dev/hda) indicated that this block was in the /dev/hda3 partition which contained a ReiserFS file system. That partition started at block address 54781650.

While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by smartctl -A /dev/hda. Specifically the values associated with the “Reallocated_Sector_Ct” and “Reallocated_Event_Count” attributes (for ATA disks, the grown list (GLIST) length for SCSI disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors.

[Step 1] Get the file system’s block size:

# debugreiserfs /dev/hda3 | grep '^Blocksize' Blocksize: 4096

[Step 2] Calculate the block number:

# echo "(58656333-54781650)*512/4096" | bc -l 484335.37500000000000000000

It is re-assuring that the calculated 4 KB damaged block address in /dev/hda3 is less than “Count of blocks on the device” shown in the output of debugreiserfs shown above.

[Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block’ error we should check if our calculation in [Step 2] was correct ;)

# debugreiserfs -1 484335 /dev/hda3 debugreiserfs 3.6.19 (2003 http://www.namesys.com)  484335 is free in ondisk bitmap The problem has occurred looks like a hardware problem.

If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight, the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to risk your time and data on it. If you don’t want to follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly.

bread: Cannot read the block (484335): (Input/output error).  Aborted

So it looks like we have the right (i.e. faulty) block address.

[Step 4] Try then to find the affected file [3]:

tar -cO /mydir | cat >/dev/null

If you do not find any unreadable files, then the block may be free or located in some metadata of the file system.

[Step 5] Try your luck: bang the affected block with badblocks -n (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation [4]:

# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`

[5]

check success with debugreiserfs -1 484335 /dev/hda3. Otherwise:

[Step 6] Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:

# dd if=/dev/zero of=/dev/hda3 count=1 bs=4096 seek=484335 1+0 records in 1+0 records out 4096 bytes transferred in 0.007770 seconds (527153 bytes/sec)

[Step 7] If you can’t rule out the bad block being in metadata, do a file system check:

reiserfsck --check

This could take a long time so you probably better go for lunch …

[Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.

Repairs at the disk level

This section first looks at a damaged partition table. Then it ignores the upper level impact of a bad block and just repairs the underlying sector so that defective sector will not cause problems in the future.

Partition table problems

Some software failures can lead to zeroes or random data being written on the first block of a disk. For disks that use a DOS-based partitioning scheme this will overwrite the partition table which is found at the end of the first block. This is a single point of failure so after the damage tools like fdisk have no alternate data to use so they report no partitions or a damaged partition table.

One utility that may help is testdisk which can scan a disk looking for partitions and recreate a partition table if requested. [6]

Programs that create DOS partitions often place the first partition at logical block address 63. In Linux a loop back mount can be attempted at the appropriate offset of a disk with a damaged partition table. This approach may involve placing the disk with the damaged partition table in a working computer or perhaps an external USB enclosure. Assuming the disk with the damaged partition is /dev/hdb. Then the following read-only loop back mount could be tried:

# mount -r /dev/hdb -o loop,offset=32256 /mnt

The offset is in bytes so the number given is (63 * 512). If the file system cannot be identified then a ‘-t <fs_type>’ may be needed (although this is not a good sign). If this mount is successful, a backup procedure is advised.

Only the primary DOS partitions are recorded in the first block of a disk. The extended DOS partition table is placed elsewhere on a disk. Again there is only one copy of it so it represents another single point of failure. All DOS partition information can be read in a form that can be used to recreate the tables with the sfdisk command. Obviously this needs to be done beforehand and the file put on other media. Here is how to fetch the partition table information:

# sfdisk -dx /dev/hda > my_disk_partition_info.txt

Then my_disk_partition_info.txt should be placed on other media. If disaster strikes, then the disk with the damaged partition table(s) can be placed in a working system, let us say the damaged disk is now at /dev/hdc, and the following command restores the partition table(s):

# sfdisk -x -O part_block_prior.img /dev/hdc < my_disk_partition_info.txt

Since the above command is potentially destructive it takes a copy of the block(s) holding the partition table(s) and puts it in part_block_prior.img prior to any changes. Then it changes the partition tables as indicated by my_disk_partition_info.txt. For what it is worth the author did test this on his system! [7]

For creating, destroying, resizing, checking and copying partitions, and the file systems on them, GNU’s parted is worth examining. The Large Disk HOWTO is also a useful resource.

LVM repairs

This section was written by Frederic BOITEUX. It was titled: “HOW TO LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME”.

Smartd reports an error in a short test :

# smartctl -a /dev/hdb ... SMART Self-test log structure revision number 1 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error # 1  Short offline       Completed: read failure       90%        66         37383668

So the disk has a bad block located in LBA block 37383668

In which physical partition is the bad block ?

# sfdisk -luS /dev/hdb  # or 'fdisk -ul /dev/hdb'  Disk /dev/hdb: 9729 cylinders, 255 heads, 63 sectors/track Units = sectors of 512 bytes, counting from 0     Device Boot    Start       End   #sectors  Id  System /dev/hdb1            63    996029     995967  82  Linux swap / Solaris /dev/hdb2   *    996030   1188809     192780  83  Linux /dev/hdb3       1188810 156296384  155107575  8e  Linux LVM /dev/hdb4             0         -          0   0  Empty

It’s in the /dev/hdb3 partition, a LVM2 partition. From the LVM2 partition beginning, the bad block has an offset of

(37383668 - 1188810) = 36194858

We have to find in which LVM2 logical partition the block belongs to.

In which logical partition is the bad block ?

IMPORTANT : LVM2 can use different schemes dividing its physical partitions to logical ones : linear, striped, contiguous or not… The following example assumes that allocation is linear !

The physical partition used by LVM2 is divided in PE (Physical Extent) units of the same size, starting at pe_start’ 512 bytes blocks from the beginning of the physical partition.

The ‘pvdisplay’ command gives the size of the PE (in KB) of the LVM partition :

#  part=/dev/hdb3 ; pvdisplay -c $part | awk -F: '{print $8}' 4096

To get its size in LBA block size (512 bytes or 0.5 KB), we multiply this number by 2 : 4096 * 2 = 8192 blocks for each PE.

To find the offset from the beginning of the physical partition is a bit more difficult : if you have a recent LVM2 version, try :

# pvs -o+pe_start $part

Either, you can look in /etc/lvm/backup :

# grep pe_start $(grep -l $part /etc/lvm/backup/*)                         pe_start = 384

Then, we search in which PE is the badblock, calculating the PE rank in which the faulty block of the partition is : physical partition’s bad block number / sizeof(PE) =

36194858 / 8192 = 4418.3176

So we have to find in which LVM2 logical partition is used the PE number 4418 (count starts from 0) :

# lvdisplay --maps |egrep 'Physical|LV Name|Type'   LV Name                /dev/WDC80Go/racine     Type                linear     Physical volume     /dev/hdb3     Physical extents    0 to 127   LV Name                /dev/WDC80Go/usr     Type                linear     Physical volume     /dev/hdb3     Physical extents    128 to 1407   LV Name                /dev/WDC80Go/var     Type                linear     Physical volume     /dev/hdb3     Physical extents    1408 to 1663   LV Name                /dev/WDC80Go/tmp     Type                linear     Physical volume     /dev/hdb3     Physical extents    1664 to 1791   LV Name                /dev/WDC80Go/home     Type                linear     Physical volume     /dev/hdb3     Physical extents    1792 to 3071   LV Name                /dev/WDC80Go/ext1     Type                linear     Physical volume     /dev/hdb3     Physical extents    3072 to 10751   LV Name                /dev/WDC80Go/ext2     Type                linear     Physical volume     /dev/hdb3     Physical extents    10752 to 18932

So the PE #4418 is in the /dev/WDC80Go/ext1 LVM logical partition.

Size of logical block of file system on /dev/WDC80Go/ext1  :

It’s a ext3 fs, so I get it like this :

# dumpe2fs /dev/WDC80Go/ext1 | grep 'Block size' dumpe2fs 1.37 (21-Mar-2005) Block size:               4096

bad block number for the file system :

The logical partition begins on PE 3072 :

 (# PE's start of partition * sizeof(PE)) + parttion offset[pe_start] =  (3072 * 8192) + 384 = 25166208

512b block of the physical partition, so the bad block number for the file system  is :

(36194858 - 25166208) / (sizeof(fs block) / 512) = 11028650 / (4096 / 512)  = 1378581.25

Test of the fs bad block :

dd if=/dev/WDC80Go/ext1 of=block1378581 bs=4096 count=1 skip=1378581

If this dd command succeeds, without any error message in console or syslog, then the block number calculation is probably wrong ! *Don’t* go further, re-check it and if you don’t find the error, please renounce !

Search / correction follows the same scheme as for simple partitions :

  • find possible impacted files with debugfs (icheck <fs block nb>, then ncheck <icheck nb>).
  • reallocate bad block writing zeros in it, *using the fs block size* :
dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581

Et voilà !

Bad block reassignment

The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be remapped irrespective of the operating system, file system or software RAID being used.

The terms block and sector are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a logical block.

When a SCSI disk is formatted, defective sectors identified during the manufacturing process (the so called primary list: PLIST), those found during the format itself (the certification list: CLIST), those given explicitly to the format command (the DLIST) and optionally the previous grown list (GLIST) are not used in the logical block map. The number (and low level addresses) of the unmapped sectors can be found with the READ DEFECT DATA SCSI command.

SCSI disks tend to be divided into zones which have spare sectors and perhaps spare tracks, to support the logical block address mapping process. The idea is that if a logical block is remapped, the heads do not have to move a long way to access the replacement sector. Note that spare sectors are a scarce resource.

Once a SCSI disk format has completed successfully, other problems may appear over time. These fall into two categories:

  • recoverable: the Error Correction Codes (ECC) detect a problem but it is small enough to be corrected. Optionally other strategies such as retrying the access may retrieve the data.
  • unrecoverable: try as it may, the disk logic and ECC algorithms cannot recover the data. This is often reported as a medium error.

Other things can go wrong, typically associated with the transport and they will be reported using a term other than medium error. For example a disk may decide a read operation was successful but a computer’s host bus adapter (HBA) checking the incoming data detects a CRC error due to a bad cable or termination.

Depending on the disk vendor, recoverable errors can be ignored. After all, some disks have up to 68 bytes of ECC above the payload size of 512 bytes so why use up spare sectors which are limited in number [8] ? If the disk can recover the data and does decide to re-allocate (reassign) a sector, then first it checks the settings of the ARRE and AWRE bits in the read-write error recovery mode page. Usually these bits are set [9] enabling automatic (read or write) re-allocation. The automatic re-allocation may also fail if the zone (or disk) has run out of spare sectors.

Another consideration with RAIDs, and applications that require a high data rate without pauses, is that the controller logic may not want a disk to spend too long trying to recover an error.

Unrecoverable errors will cause a medium error sense key, perhaps with some useful additional sense information. If the extended background self test includes a full disk read scan, one would expect the self test log to list the bad block, as shown in the the section called “Repairs in a file system”. Recent SCSI disks with a periodic background scan should also list unrecoverable read errors (and some recoverable errors as well). The advantage of the background scan is that it runs to completion while self tests will often terminate at the first serious error.

SCSI disks expect unrecoverable errors to be fixed manually using the REASSIGN BLOCKS SCSI command since loss of data is involved. It is possible that an operating system or a file system could issue the REASSIGN BLOCKS command itself but the authors are unaware of any examples. The REASSIGN BLOCKS command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name “grown” list). The contents of the GLIST may not be that interesting but smartctl prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life.

Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach.

Example

Given a “bad block”, it still may be useful to look at the fdisk command (if the disk has multiple partitions) to find out which partition is involved, then use debugfs (or a similar tool for the file system in question) to find out which, if any, file or other part of the file system may have been damaged. This is discussed in the the section called “Repairs in a file system”.

Then a program that can execute the REASSIGN BLOCKS SCSI command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows the author’s sg_reassign utility in the sg3_utils package can be used. Also found in that package is sg_verify which can be used to check that a block is readable.

Assume that logical block address 1193046 (which is 123456 in hex) is corrupt [10] on the disk at /dev/sdb. A long selftest command like smartctl -t long /dev/sdb may result in log results like this:

# smartctl -l selftest /dev/sdb smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/   SMART Self-test log Num  Test              Status            segment  LifeTime  LBA_first_err [SK ASC ASQ]      Description                         number   (hours) # 1  Background long   Failed in segment      -     354           1193046 [0x3 0x11 0x0] # 2  Background short  Completed              -     323                 - [-   -    -] # 3  Background short  Completed              -     194                 - [-   -    -]

The sg_verify utility can be used to confirm that there is a problem at that address:

# sg_verify --lba=1193046 /dev/sdb verify (10):  Fixed format, current;  Sense key: Medium Error  Additional sense: Unrecovered read error   Info fld=0x123456 [1193046]   Field replaceable unit code: 228   Actual retry count: 0x008b medium or hardware error, reported lba=0x123456

Now the GLIST length is checked before the block reassignment:

# sg_reassign --grown /dev/sdb >> Elements in grown defect list: 0

And now for the actual reassignment followed by another check of the GLIST length:

# sg_reassign --address=1193046 /dev/sdb  # sg_reassign --grown /dev/sdb >> Elements in grown defect list: 1

The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the “new” block at lba 0×123456 has vendor specific data in it. The sg_reassign utility can also do bulk reassigns, see man sg_reassign for more information.

The dd command could be used to read the contents of the “new” block:

# dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1

and a hex editor [11] used to view and potentially change the blk.img file. An altered blk.img file (or /dev/zero) could be written back with:

# dd if=blk.img of=/dev/sdb seek=1193046 oflag=direct bs=512 count=1

More work may be needed at the file system level, especially if the reassigned block held critical file system information such as a superblock or a directory.

Even if a full backup of the disk is available, or the disk has been “ejected” from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see sg_format in the sg3_utils package)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used).

CVS $Id: badblockhowto.xml,v 1.5 2008/07/17 18:24:14 chrfranke Exp $


[1] Self-Monitoring, Analysis and Reporting Technology -> SMART

[2] Starting with GNU coreutils release 5.3.0, the dd command in Linux includes the options ‘iflag=direct’ and ‘oflag=direct’. Using these with the dd commands should be helpful, because adding these flags should avoid any interaction with the block buffering IO layer in Linux and permit direct reads/writes from the raw device. Use dd –help to see if your version of dd supports these options. If not, the latest code for dd can be found at alpha.gnu.org/gnu/coreutils.

[3] Do not use tar -c -f /dev/null or tar -cO /mydir >/dev/null. GNU tar does not actually read the files if /dev/null is used as archive path or as standard output, see info tar.

[4] Important: set blocksize range is arbitrary, but do not only test a single block, as bad blocks are often social. Not too large as this test probably has not 0% risk.

[5] The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced with $((484335+100)) if the bash shell is being used. Similarly the last argument can become $((484335-100)) .

[6] testdisk scans the media for the beginning of file systems that it recognizes. It can be tricked by data that looks like the beginning of a file system or an old file system from a previous partitioning of the media (disk). So care should be taken. Note that file systems should not overlap apart from the fact that extended partitions lie wholly within a extended partition table allocation. Also if the root partition of a Linux/Unix installation can be found then the /etc/fstab file is a useful resource for finding the partition numbers of other partitions.

[7] Thanks to Manfred Schwarb for the information about storing partition table(s) beforehand.

[8] Detecting and fixing an error with ECC “on the fly” and not going the further step and reassigning the block in question may explain why some disks have large numbers in their read error counter log. Various worried users have reported large numbers in the “errors corrected without substantial delay” counter field which is in the “Errors corrected by ECC fast” column in the smartctl -l error output.

[9] Often disks inside a hardware RAID have the ARRE and AWRE bits cleared (disabled) so the RAID controller can do things manually or flag the disk for replacement.

[10] In this case the corruption was manufactured by using the WRITE LONG SCSI command. See sg_write_long in sg3_utils.

[11] Most window managers have a handy calculator that will do hex to decimal conversions. More work may be needed at the file system level,



strace命令用法


strace命令用法

调用:
strace [ -dffhiqrtttTvxx ] [ -acolumn ] [ -eexpr ] …
[ -ofile ] [ -ppid ] … [ -sstrsize ] [ -uusername ] [ command [ arg … ] ]

strace -c [ -eexpr ] … [ -Ooverhead ] [ -Ssortby ] [ command [ arg … ] ]
功能:
跟踪程式执行时的系统调用和所接收的信号.通常的用法是strace执行一直到commande结束.
并且将所调用的系统调用的名称、参数和返回值输出到标准输出或者输出到-o指定的文件.
strace是一个功能强大的调试,分析诊断工具.你将发现他是一个极好的帮手在你要调试一个无法看到源码或者源码无法在编译的程序.
你将轻松的学习到一个软件是如何通过系统调用来实现他的功能的.而且作为一个程序设计师,你可以了解到在用户态和内核态是如何通过系统调用和信号来实现程序的功能的.
strace的每一行输出包括系统调用名称,然后是参数和返回值.这个例子:
strace cat /dev/null
他的输出会有:
open(\\”/dev/null\\”,O_RDONLY) = 3
有错误产生时,一般会返回-1.所以会有错误标志和描述:
open(\\”/foor/bar\\”,)_RDONLY) = -1 ENOENT (no such file or directory)
信号将输出喂信号标志和信号的描述.跟踪并中断这个命令\\”sleep 600\\”:
sigsuspend({}
— SIGINT (Interrupt) —
+++ killed by SIGINT +++
参数的输出有些不一致.如shell命令中的 \\”>>tmp\\“,将输出:
open(\\”tmp\\”,O_WRONLY|O_APPEND|A_CREAT,0666) = 3
对于结构指针,将进行适当的显示.如:\\”ls -l /dev/null\\”:
lstat(\\”/dev/null\\”,{st_mode=S_IFCHR|0666},st_rdev=makdev[1,3],…}) = 0
请注意\\”struct stat\\” 的声明和这里的输出.lstat的第一个参数是输入参数,而第二个参数是向外传值.
当你尝试\\”ls -l\\” 一个不存在的文件时,会有:
lstat(/foot/ball\\”,0xb004) = -1 ENOENT (no such file or directory)
char*将作为C的字符串类型输出.没有字符串输出时一般是char* 是一个转义字符,只输出字符串的长度.
当字符串过长是会使用\\”…\\“省略.如在\\”ls -l\\”会有一个gepwuid调用读取password文件:
read(3,\\”root::0:0:System Administrator:/\\”…,1024) = 422
当参数是结构数组时,将按照简单的指针和数组输出如:
getgroups(4,[0,2,4,5]) = 4
关于bit作为参数的情形,也是使用方括号,并且用空格将每一项参数隔开.如:
sigprocmask(SIG_BLOCK,[CHLD TTOU],[]) = 0
这里第二个参数代表两个信号SIGCHLD 和 SIGTTOU.如果bit型参数全部置位,则有如下的输出:
sigprocmask(SIG_UNBLOCK,~[],NULL) = 0
这里第二个参数全部置位.

参数说明:
-c 统计每一系统调用的所执行的时间,次数和出错的次数等.
-d 输出strace关于标准错误的调试信息.
-f 跟踪由fork调用所产生的子进程.
-ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
-h 输出简要的帮助信息.
-i 输出系统调用的入口指针.
-q 禁止输出关于脱离的消息.
-r 打印出相对时间关于,,每一个系统调用.
-t 在输出中的每一行前加上时间信息.
-tt 在输出中的每一行前加上时间信息,微秒级.
-ttt 微秒级输出,以秒了表示时间.
-T 显示每一调用所耗的时间.
-v 输出所有的系统调用.一些调用关于环境变量,状态,输入输出等调用由于使用频繁,默认不输出.
-V 输出strace的版本信息.
-x 以十六进制形式输出非标准字符串
-xx 所有字符串以十六进制形式输出.
-a column
设置返回值的输出位置.默认为40.
-e expr
指定一个表达式,用来控制如何跟踪.格式如下:
[qualifier=][!]value1[,value2]…
qualifier只能是 trace,abbrev,verbose,raw,signal,read,write其中之一.value是用来限定的符号或数字.默认的qualifier是 trace.感叹号是否定符号.例如:
-eopen等价于 -e trace=open,表示只跟踪open调用.而-etrace!=open表示跟踪除了open以外的其他调用.有两个特殊的符号 all 和 none.
注意有些shell使用!来执行历史记录里的命令,所以要使用\\\\.
-e trace=set
只跟踪指定的系统调用.例如:-e trace=open,close,rean,write表示只跟踪这四个系统调用.默认的为set=all.
-e trace=file
只跟踪有关文件操作的系统调用.
-e trace=process
只跟踪有关进程控制的系统调用.
-e trace=network
跟踪与网络有关的所有系统调用.
-e strace=signal
跟踪所有与系统信号有关的系统调用
-e trace=ipc
跟踪所有与进程通讯有关的系统调用
-e abbrev=set
设定strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all.
-e raw=set
将指定的系统调用的参数以十六进制显示.
-e signal=set
指定跟踪的系统信号.默认为all.如signal=!SIGIO(或者signal=!io),表示不跟踪SIGIO信号.
-e read=set
输出从指定文件中读出的数据.例如:
-e read=3,5
-e write=set
输出写入到指定文件中的数据.
-o filename
将strace的输出写入文件filename
-p pid
跟踪指定的进程pid.
-s strsize
指定输出的字符串的最大长度.默认为32.文件名一直全部输出.
-u username
以username的UID和GID执行被跟踪的命令.
用strace调试程序

     在理想世界里,每当一个程序不能正常执行一个功能时,它就会给出一个有用的错误提示,告诉你在足够的改正错误的线索。但遗憾的是,我们不是生活在理想世界里,起码不总是生活在理想世界里。有时候一个程序出现了问题,你无法找到原因。

这就是调试程序出现的原因。strace是一个必不可少的调试工具,strace用来监视系统调用。你不仅可以调试一个新开始的程序,也可以调试一个已经在运行的程序(把strace绑定到一个已有的PID上面)。

首先让我们看一个真实的例子:

[BOLD]启动KDE时出现问题[/BOLD]

前一段时间,我在启动KDE的时候出了问题,KDE的错误信息无法给我任何有帮助的线索。

代码:

_KDE_IceTransSocketCreateListener: failed to bind listener
_KDE_IceTransSocketUNIXCreateListener: …SocketCreateListener() failed
_KDE_IceTransMakeAllCOTSServerListeners: failed to create listener for local

Cannot establish any listening sockets DCOPServer self-test failed.

对我来说这个错误信息没有太多意义,只是一个对KDE来说至关重要的负责进程间通信的程序无法启动。我还可以知道这个错误和ICE协议(Inter Client Exchange)有关,除此之外,我不知道什么是KDE启动出错的原因。

我决定采用strace看一下在启动dcopserver时到底程序做了什么:

代码:

strace -f -F -o ~/dcop-strace.txt dcopserver

这里 -f -F选项告诉strace同时跟踪fork和vfork出来的进程,-o选项把所有strace输出写到~/dcop-strace.txt里面,dcopserver是要启动和调试的程序。

再次出现错误之后,我检查了错误输出文件dcop-strace.txt,文件里有很多系统调用的记录。在程序运行出错前的有关记录如下:

代码:

27207 mkdir(”/tmp/.ICE-unix”, 0777) = -1 EEXIST (File exists)
27207 lstat64(”/tmp/.ICE-unix”, {st_mode=S_IFDIR|S_ISVTX|0755, st_size=4096, …}) = 0
27207 unlink(”/tmp/.ICE-unix/dcop27207-1066844596″) = -1 ENOENT (No such file or directory)
27207 bind(3, {sin_family=AF_UNIX, path=”/tmp/.ICE-unix/dcop27207-1066844596″}, 3 = -1 EACCES (Permission denied)
27207 write(2, “_KDE_IceTrans”, 13) = 13
27207 write(2, “SocketCreateListener: failed to “…, 46) = 46
27207 close(3) = 0 27207 write(2, “_KDE_IceTrans”, 13) = 13
27207 write(2, “SocketUNIXCreateListener: …Soc”…, 59) = 59
27207 umask(0) = 0 27207 write(2, “_KDE_IceTrans”, 13) = 13
27207 write(2, “MakeAllCOTSServerListeners: fail”…, 64) = 64
27207 write(2, “Cannot establish any listening s”…, 39) = 39

其 中第一行显示程序试图创建/tmp/.ICE-unix目录,权限为0777,这个操作因为目录已经存在而失败了。第二个系统调用(lstat64)检查 了目录状态,并显示这个目录的权限是0755,这里出现了第一个程序运行错误的线索:程序试图创建属性为0777的目录,但是已经存在了一个属性为 0755的目录。第三个系统调用(unlink)试图删除一个文件,但是这个文件并不存在。这并不奇怪,因为这个操作只是试图删掉可能存在的老文件。

但 是,第四行确认了错误所在。他试图绑定到/tmp/.ICE-unix/dcop27207-1066844596,但是出现了拒绝访问错误。. ICE_unix目录的用户和组都是root,并且只有所有者具有写权限。一个非root用户无法在这个目录下面建立文件,如果把目录属性改成0777, 则前面的操作有可能可以执行,而这正是第一步错误出现时进行过的操作。

所以我运行了chmod 0777 /tmp/.ICE-unix之后KDE就可以正常启动了,问题解决了,用strace进行跟踪调试只需要花很短的几分钟时间跟踪程序运行,然后检查并分析输出文件。

说 明:运行chmod 0777只是一个测试,一般不要把一个目录设置成所有用户可读写,同时不设置粘滞位(sticky bit)。给目录设置粘滞位可以阻止一个用户随意删除可写目录下面其他人的文件。一般你会发现/tmp目录因为这个原因设置了粘滞位。KDE可以正常启动 之后,运行chmod +t /tmp/.ICE-unix给.ICE_unix设置粘滞位。

[BOLD]解决库依赖问题[/BOLD]

starce 的另一个用处是解决和动态库相关的问题。当对一个可执行文件运行ldd时,它会告诉你程序使用的动态库和找到动态库的位置。但是如果你正在使用一个比较老 的glibc版本(2.2或更早),你可能会有一个有bug的ldd程序,它可能会报告在一个目录下发现一个动态库,但是真正运行程序时动态连接程序 (/lib/ld-linux.so.2)却可能到另外一个目录去找动态连接库。这通常因为/etc/ld.so.conf和 /etc/ld.so.cache文件不一致,或者/etc/ld.so.cache被破坏。在glibc 2.3.2版本上这个错误不会出现,可能ld-linux的这个bug已经被解决了。

尽管这样,ldd并不能把所有程序 依赖的动态库列出来,系统调用dlopen可以在需要的时候自动调入需要的动态库,而这些库可能不会被ldd列出来。作为glibc的一部分的NSS (Name Server Switch)库就是一个典型的例子,NSS的一个作用就是告诉应用程序到哪里去寻找系统帐号数据库。应用程序不会直接连接到NSS库,glibc则会通 过dlopen自动调入NSS库。如果这样的库偶然丢失,你不会被告知存在库依赖问题,但这样的程序就无法通过用户名解析得到用户ID了。让我们看一个例 子:

whoami程序会给出你自己的用户名,这个程序在一些需要知道运行程序的真正用户的脚本程序里面非常有用,whoami的一个示例输出如下:
代码:

# whoami
root

假设因为某种原因在升级glibc的过程中负责用户名和用户ID转换的库NSS丢失,我们可以通过把nss库改名来模拟这个环境:
代码:

# mv /lib/libnss_files.so.2 /lib/libnss_files.so.2.backup
# whoami
whoami: cannot find username for UID 0

这里你可以看到,运行whoami时出现了错误,ldd程序的输出不会提供有用的帮助:
代码:

# ldd /usr/bin/whoami
libc.so.6 => /lib/libc.so.6 (0×4001f000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0×40000000)

你只会看到whoami依赖Libc.so.6和ld-linux.so.2,它没有给出运行whoami所必须的其他库。这里时用strace跟踪whoami时的输出:
代码:

strace -o whoami-strace.txt whoami

open(”/lib/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
open(”/lib/i686/mmx/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(”/lib/i686/mmx”, 0xbffff190) = -1 ENOENT (No such file or directory)
open(”/lib/i686/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(”/lib/i686″, 0xbffff190) = -1 ENOENT (No such file or directory)
open(”/lib/mmx/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(”/lib/mmx”, 0xbffff190) = -1 ENOENT (No such file or directory)
open(”/lib/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(”/lib”, {st_mode=S_IFDIR|0755, st_size=2352, …}) = 0
open(”/usr/lib/i686/mmx/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)
stat64(”/usr/lib/i686/mmx”, 0xbffff190) = -1 ENOENT (No such file or directory)
open(”/usr/lib/i686/libnss_files.so.2″, O_RDONLY) = -1 ENOENT (No such file or directory)

你可以发现在不同目录下面查找libnss.so.2的尝试,但是都失败了。如果没有strace这样的工具,很难发现这个错误是由于缺少动态库造成的。现在只需要找到libnss.so.2并把它放回到正确的位置就可以了。

[BOLD]限制strace只跟踪特定的系统调用[/BOLD]

如果你已经知道你要找什么,你可以让strace只跟踪一些类型的系统调用。例如,你需要看看在configure脚本里面执行的程序,你需要监视的系统调用就是execve。让strace只记录execve的调用用这个命令:

代码:

strace -f -o configure-strace.txt -e execve ./configure

部分输出结果为:
代码:

2720 execve(”/usr/bin/expr”, [”expr”, “a”, “:”, “(a)”], [/* 31 vars */]) = 0
2725 execve(”/bin/basename”, [”basename”, “./configure”], [/* 31 vars */]) = 0
2726 execve(”/bin/chmod”, [”chmod”, “+x”, “conftest.sh”], [/* 31 vars */]) = 0
2729 execve(”/bin/rm”, [”rm”, “-f”, “conftest.sh”], [/* 31 vars */]) = 0
2731 execve(”/usr/bin/expr”, [”expr”, “99″, “+”, “1″], [/* 31 vars */]) = 0
2736 execve(”/bin/ln”, [”ln”, “-s”, “conf2693.file”, “conf2693″], [/* 31 vars */]) = 0

你 已经看到了,strace不仅可以被程序员使用,普通系统管理员和用户也可以使用strace来调试系统错误。必须承认,strace的输出不总是容易理 解,但是很多输出对大多数人来说是不重要的。你会慢慢学会从大量输出中找到你可能需要的信息,像权限错误,文件未找到之类的,那时strace就会成为一 个有力的工具了。



在 Web 浏览在 IIS 中许多应用程序池单独标识下运行时客户端收到可用 ” 服务 ” 错误信息


运行 Microsoft Internet Information Services (IIS), 不同自定义标识下多个应用程序池时某些辅助进程不初始化正确。 出现此问题时, 可能会收到可用 ” 服务 ” 客户这些应用程序池宿主页面, 当他们尝试访问 Web 错误信息。 此外, 系统日志中可能记录以下警告消息:

事件类型: 警告
事件源: W3SVC
事件类别: 无
事件 ID 1009:
说明:
处理应用程序池 ‘ poolname ‘ 进程意外终止。 进程 ID 是 processid ‘ ‘。 进程退出代码为 0 x 80 ‘ ‘。

当您使用三个预定义标识不会发生此问题。 预定义标识是 NetworkService、 本地服务, 和本地系统。

注意 在同时 32 - 位版本的 IIS 和 64 - 位版本的 IIS 发生此问题。

通过从配置桌面堆分配内存一起使用, IIS 使用独立标识, 创建每个辅助进程系统创建一个新桌面对象。 出现此问题原因, 该堆已用尽时 IIS 无法创建多辅助进程是。 然后可用 ” 服务 ” 客户端接收这些应用程序池宿主站点, 其Web 当他们尝试访问 Web 浏览器中错误消息。

警告 如果正确修改注册表通过注册表编辑器或通过其他方法可能发生 Serious 问题。 这些问题可能需要重新安装操作系统。 Microsoft 不能保证能够解决这些问题而。 修改注册表需要您自担风险。

要解决此问题, 添加 UseSharedWPDesktop 注册表项, 是运行 IIS 的计算机。 此注册表项允许所有要在一个共享桌面, 不管其辅助进程标识运行辅助进程。

要添加 UseSharedWPDesktop 注册表项:

  1. 单击 开始 , 单击 运行 , 类型 regedit 然后单击 确定 。
  2. 找到以下注册表项:

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3SVC

  3. 右键单击 Parameters , 指向 新建 , 然后单击 DWORD 值 。
  4. 类型 UseSharedWPDesktop.
  5. 将对此新项值设置为 1。
  6. 退出注册表编辑器, 并重新启动 IIS。


路由器初始设置参数及命令配置


路由器初始设置参数及命令配置
当路由器进行初始化时,路由器进行以下操作:
  
  1)自ROM执行上电自检,检测CPU,内存、接口电路的基本操作。
  
  2)自ROM进行引导,将操作系统装下载到主存。
  
  3)引导操作系统由配置寄存器的引导预确定由FLASH 或网络下载,则配置文件的
  
  boot system 命令确定其确切位置。
  
  4)操作系统下载到低地址内存,下载后由操作系统确定路由器的工作硬件和软件
  
  部分并在屏幕上显示其结果。
  
  5)NVRAM中存储的配置文件装载到主内存并通过执行,配置启动路由进程,提供接
  
  口地址、设置介质特性。如果NVRAM中设有有效的配置文件,则进入Setup 会话模
  
  式。
  
  6)然后进入系统配置会话,显示配置信息,如每个接口的配置信息。

  二、Setup会话
  
  当NVRAM里没有有效的配置文件时,路由器会自动进入Setup会话模式。以后也可
  
  在命令行敲入Setup进行配置。
  
  Setup 命令是一个交互方式的命令,每一个提问都有一个缺省配置,如果用缺省
  
  配置则敲回车即可。如果系统已经配置过,则显示目前的配置值。如果是第一次
  
  配置,则显示出厂设置。当屏幕显示 “—— More ——”,键入空格键继续;
  
  若从Setup 中退出,只要键入Ctrl-C即可。

  1、Setup主要参数:
  
  配置它的一般参数,包括:
  
  主机名 :hostname
  
  特权口令 :enable password
  
  虚终端口令 :virtual terminal password
  
  SNMP网管 :SNMP Network Management
  
  IP :IP
  
  IGRP路由协议:IGRP Routing
  
  RIP路由协议 :RIP Routing
  
  DECnet : DECnet . 等

  其中 Console 的secret、 password的设置:
  
  enable secret
  
  enable password
  
  Virtual Terminor 的password的设置:
  
  Line vty
  
  Password
  
  Host name的设置:
  
  Hostname

  2、Setup接口参数:
  
  设置接口参数,如以太网口、TokenRing口、同步口、异步口等。包括IP地址、子
  
  网屏蔽、TokengRing速率等。

  3、Setup描述:
  
  在设置完以上参数后,该命令提示是否要用以上的配置,如果回答是”YES”则系统
  
  会存储以上的配置参数,系统就可以使用了。

  4、 Setup相关命令:
  
  Show config
  
  write memory
  
  write erase
  
  reload
  
  setup

  5、路由器丢失PASSWORD的恢复
  
  以下办法可以恢复:
  
  enable secret password (适合10。3(2)或更新的版本)
  
  enable password
  
  console password
  
  通过修改Configuration Register(出厂为0×2102),使路由器忽略PASSWORD,这
  
  样就可以进入路由器,就可以看到enable password和Console password,但ena
  
  ble secret password以被加密,只能替换。可以进入的configuration Registe
  
  r值为0×142.

  · 运行password恢复可能会使系统DOWN掉一个半小时;
  
  · 将Console terinal连在路由器的Console口上,确认终端设置为9600bps、8
  
  Data bit 、No parity、1 stop bit;
  
  · show version显示Configuration Register 0×2102;
  
  · 关机再开,按”Ctrl+ Break”,进入ROM MONITOR状态,提示符为”>”;
  
  · 键入”> o/r 0×142″,修改 Configuration Register到0×142,可以忽略原先的
  
  password;
  
  · 键入”> initialize”,初始化路由器,等一段时间后,路由器会出现以下提示
  
  :
  
  ”system configuration Diaglog ……”
  
  Enter “NO”
  
  提示”Press RETURN to get started!” ,Press “Enter”

  · 进入特权模式
  
  Router>enable
  
  Router#show startup-config
  
  这样就可以得到password(enable&console password)
  
  · 修改password
  
  ”Router#config ter”
  
  ”Router(config)# enable secret cisco”
  
  ”Router(config)# enable password cisco1″
  
  ”Router(config)# line con 0″
  
  ”Router(config)# password cisco”
  
  ”Router(config)# config-register 0×2102″
  
  ”ctrl + Z”
  
  ”Router#copy running-config startup-config”
  
  ”reload”
  
  · 以password cisco进入特权用户。
 三、路由器配置
  
  1)路由器模式
  
  在Cisco 路由器中,命令解释器称为EXEC,EXEC解释用户键入的命令并执行相应
  
  的操作,在输入EXEC命令前必须先登录到路由器上。基于安全原因,EXEC设置了
  
  两个访问权限:用户级和特权级,用户级可执执行的命令是特权级命令的子集。
  
  在特权级,可以使用:configuration,interface,subinterface,line,rout
  
  er,router-map等命令。

  2)配置模式
  
  使用Config命令可进入配置模式,进入该模式后,EXEC提示用户可用的配置方式
  
  如终端、NVRAM、网络三种,缺省是终端方式。

  3)IP路由协议模式
  
  在配置模式下输入Router命令,可进入IP路由协议模式,可选的路由协议一般有
  
  :bgp、egp、igrp、eigrp、rip等动态路由和静态路由。

  4)接口配置模式
  
  在每一个端口上可以设置很多特性,接口配置命令修改以太网、令牌环网、FDDI
  
  或同步、异步口等操作。

  5)口令配置
  
  可以采用口令来限制对路由器的访问,口令可以设定到具体的线路上或是特权E
  
  XEC模式。
  
  Line console 0 命令设置控制台终端口令
  
  Line vty 0 命令设置Telnet虚终端口令
  
  Enable-password 命令设置特权EXEC访问权限

  6)路由器命名
  
  在配置模式下用hostname,如:
  
  hostname RouterA

  四、用户帮助提示
  
  1、在用户提示符下键入?可以列出常用命令,通常有以下命令:
  
  connect 打开一个中端连接
  
  disconnect 关闭一个已有的telnet会话
  
  enable 进入特权级
  
  exit 退出EXEC
  
  help 交互求助系统描述
  
  lock 终端锁定
  
  login 以特定用户登录
  
  logout 退出EXEC
  
  ping 发送echo信息
  
  resume 恢复一个激活的telnet连接
  
  show 显示正在运行的系统信息
  
  systat 显示正在运行的系统信息
  
  telnet 打开一个telnet连接
  
  terminal 设置终端线路参数
  
  where 列出激活的telnet连接
  
  2、上下相关帮助
  
  上下相关帮助包括:
  
  符号转换 :键入命令有错时提示;
  
  关键字完成 :键入命令字的一部分即可;
  
  命令记忆 :可用” “调出以前的命令;
  
  命令提示 :当命令记不完全时,可用”?”替代

路由器中config-register各位的含义以及配合TFTP服务的应用

config-register共16位,以4位16进制数表示

格式:0xABCD
赋值范围从0×0到0xFFFF

0×2102 :标准默认值
0×2142 :从FLASH中启动,但不使用NVRAM中的配置文件(用于口令恢复)
0×2101 :从Boot RAM中启动,应用于更新系统文件
0×2141 :从Boot RAM中启动,但不使用NVRAM中的配置文件
其中C位的第三位为1时表示关闭Break键,反之表示打开Break键。
0×141:表示关闭Break键,不使用NVRAM中的配置文件,并且从系统默认的ROM中
的系统中启动。
0×0040:表示允许路由器读取NVRAM中的配置文件。

表1:config-register中各位的含义

位  十六进制值 含义
00-03  0×0000-0×000F  启动位 
05  0×0020  使用扩展的console速率,19200bps以上
06  0×0040  使得系统软件忽略NVRAM中的内容
07  0×0080  OEM位开启
08  0×0100  Break键关闭 
10  0×0400  IP广播到所有域 
11-12  0×0800-0×1000  Console口速率
13  0×2000  如果网络启动失败,默认从ROM中启动 
14  0×4000  不包含网络号的IP广播
15  0×8000  启动诊断信息同时忽略NVRAM内容

===============================================
以下是config-register和TFTP服务的联合应用:
  笔者有一台2501路由器的IOS是坏的(cisocB),另外一台2501路由器IOS是好的(ciscoA),好的
IOS在我的PC上面己经没有了,但是通过在PC上面做一个TFTP,把好的传到PC上面,再传
到坏的上面就太浪费时间了。后来想到一个办法,直接把好的路由器配置成TFTP服务器。
我的连接方式是两台路由器是通过E0口连接的,我先是在好的路由器上面(cisocA )把E0口配
置好IP是192.168.10.90(IP地址随便配置,只要两个E0口都能互相PING通就行)再到配置模式下
面输入 tftp-server c2500-js-l.122-7a.bin(这是你好的IOS里面的IOS文件名,用show version 来查
看你的IOS文件名

这台A己经配置好了以后,再到B这台来配置,首先让这台启动到BOOT模式,在全局配置模
式下面输入config-register 0×2101 再重启路由器,启动到BOOT模式以后,同样配置你的E口,
到了BOOT模式以后,输入copy tftp flash ,提示你输入TFTP服务器,再输入文件名,就可以
了,具体操作步骤如下:

—————————————————————————–
cisocA#conf t
cisocA(config)#tftp-server c2500-js-l.122-7a.bin
cisocA(config)#int e0
cisocA(config-if)#ip add 192.168.10.90 255.255.255.0
cisocA(config-if)#no shut
ciscoB(config)#conf t
ciscoB(config)#config-register 0×2101
ciscoB(config)#end
ciscoB#reload
System configuration has been modified. Save? [yes/no]: y
ciscoB(boot)>enable
ciscoB(boot)#conf terminal
ciscoB(boot)(config)#int e0
ciscoB(boot)(config-if)#ip add 192.168.10.80 255.255.255.0
ciscoB(boot)(config-if)#no shut
ciscoB(boot)(config-if)#end
ciscoB(boot)#erase flash
System flash directory:
File Length Name/status
1 15533612 c2500-js-l.122-7a.bin
[15533676 bytes used, 1243540 available, 16777216 total]
Erase flash device? [confirm]
Are you sure? [yes/no]: y
Address or name of remote host [192.168.10.80]? 192.168.10.90
Source file name? c2500-js-l.122-7a.bin
Destination file name [c2500-js-l.122-7a.bin]?
Erase flash device before writing? [confirm](回车)
Copy ‘c2500-js-l.122-7a.bin’ from server
as ‘c2500-js-l.122-7a.bin’ into Flash WITH erase? [yes/no]y
输入Y以后,程序就自动COPY IOS,等成功以后,你再如下配置
ciscoB(config)#config-register 0×2102
ciscoB(config)#reload
—————————————————————————–



基础入门:Cisco配置手记


现有设备:CISCO路由器2620XM(4台)和2621XM(5台),3750三层交换机,PIX-515E防火墙,CISCO2950二层交换机(9台)
重点命令:有安全,控制,监控,监测和检测功能的命令集合和命令组合

一、两层交换机

1、基本配置
(1)设置VLAN1的IP地址,掩码:

配置:
sw itch#config terminal
(config)#interface vlan1    !进入到要配置IP的接口
(config-if)#ip address 10.1.10.253(ip)  255.255.255.0(mask)  !设置参数
验证:
(config-if)#exit
switch#show interface vlan1
保存设置:
switch#copy running-config startup-config

(2)划分VLAN
配置:
switch#vlan database(还有一种方法)   !创建一个VLAN
switch#vlan 2
switch#exit
switch#config terminal
one port:
(config)#interface fastethernet0/0      !进入到要被划分的端口
(config-if)#switchport access vlan 2    !划分到一个VLAN
multiports:
(config)#interface range fastethernet0/0 -7  !进入到要被集体划分的端口
(config-if)#switchport access vlan 2     !划分到一个VLAN
验证:
switch#show vlan
保存:
switch#copy running-config startup-config

(3)设置trunk
配置:
switch#config terminal
(config)#interface gigabitethernet0/1  !进入要配置成干道的接口
(config-if)#switchport mode trunk    !设置成干道
验证:
switch#show interface trunk
保存:
switch#copy running-config startup-config

(4)连接路由器
如果交换机上有多个VLAN,则所连的路由器接口就必须有多个IP地址。要用子接口设置多IP地址。连接到路由器上的接口要被设置成trunk,并且要封装干道协议:ISL,或者802.1Q。
配置交换机:
switch#config terminal
(config)#interface gigabitethernet0/1
(config-if)#switchport mode trunk  !配置成干道,将自动封装802.1q协议
配置路由器
router#config terminal
(config)#interface fastethernet0/0.2   !进入子接口2
(config-subif)#encapsulation dot1q 2   !子接口对应VLAN2,并封装dot1q协议
(config-subif)#ip address 10.1.20.1  255.255.255.0  !配置了10.1.20.0/24网段的网关
确认:
router#show interface fastethernet0/0
不同VLAN下的主机可以相互ping通,则配置成功。
保存配置

(5)连接交换机
同种类型的网络设备相连要使用交叉线。交换机使用交叉线相连后,将会自动将两端设置成干道。

2、VTP(VLAN Trunk Protocol)
(1) 作用:
允许用户集中管理网络中交换机的配。VTP是一种消息协议,可以对整个网络内的VLAN的添加、删除和重命名操作进行管理,以此维护VLAN配置的一致性。
(2) 工作方式
确定一条交换机为VTP服务器。
可以在服务器上更改VLAN的配置,并把该配置传播到网络中的所有VTP客户机。
当交换机配置成VTP客户机之后,就不能物理地改变该交换机的VLAN配置。
唯一可以更改VLAN配置的方法是当且仅当VTP客户端交换机接收到来自其VTP服务器的VTP更新信息时,才能更改。
多台VTP服务器管理不同的VTP客户机,必须指定一个VTP域。服务器和客户机在各自的域内。

二、路由器
1、基本配置

(1)以太网口配置
注:路由器以太网口直接接主机用交叉线。
(2)串口配置
(3)配置静态路由
(4)配置动态路由协议
(5)配置访问控制列表(ACL)**
(6)路由器互联

2、问题
(1)无法配置静态路由,出现“Default gateway is not set ….. ICMP redirect cache is empty”
原因:IP路由被禁用
解决:(config)#ip routing

(2)与其他设备的接口状态上,”protocol  down”
可能的原因:双绞线的接线类型不对
解决:换成直通线或者交叉线。

三、三层交换机
1、基本配置

(1)配置IP
手工配置:
(config)#interface vlan vlan-id
(config-if)#ip address ip-address subnet-mask
(config-if)#exit
(config)#ip default-gateway ip-address
确认配置:
#show interface vlan vlan-id
#show ip redirects  !确认默认网关配置
保存:#copy running-config startup-config
使用DHCP配置

(2)使不同VLAN互联
 

(3)配置某个端口为trunk
(config)#interface fastethernet1/0/23
(config-if)#switchport encapsultion dot1q
(config-if)#switchport mode trunk

(4)默认路由及路由协议的设定
问题
(1)多个子网连接到三层交换机,交换机上设定了每个子网对应的网关地址,交换机通过router连接到其他的网络或者区域。三层switch和router上相同网段的地址无法相互ping 通。如:3750上有192.168.8.254(VLAN1),192.168.16.254(VLAN2),192.168.24.254(VLAN3);router上有192.168.8.1,192.168.16.1,192.168.24.1。
    现象:192.168.8.254 可以ping通192.168.8.1,但是.16.和.24.网段的无法ping通。
    原因:router接到switch上的接口没有设置成trunk。

四、防火墙
CISCO PIX系列属于状态检测防火墙。
Note:ASA(Adaptive Security Algorithm) allows one way (inside to outside) connections without an explicit configuration in memory.

1、特点
(1)自适应安全算法(ASA)
创建状态会话流表(state table)。各种连接信息都被记录进表中。
ASA是一个有状态、面向连接的过程,它在状态表中维持会话信息,应用对状态表的安全策略来控制通过防火墙的所有流量。
连接状态包括:源/目的IP,源/目的端口,TCP顺序信息,附加的TCP/UDP标记。应用一个随机产生的TCP顺序号。总称为“会话对象”。

内部不主动发出数据,要求响应,外部的数据就无法进入内部了吗。
PIX中ASA和状态过滤的工作机制:
a、 内部主机开始一个对外部资源的连接
b、 PIX在状态表中写入一个会话(连接)对象
c、 会话对象同安全策略相比较。如果连接不被允许,此会话对象被删除,并且连接被取消
h、 如果安全策略认可这个连接,此连接继续向外部资源发送
j、 外部资源响应这个请求
k、 响应信息到达防火墙,与会话对象比较。匹配则响应信息被发送到内部主机,不匹配则连接就会被取消。

 (2)贯穿式代理
认证和授权一个防火墙上输入/输出的连接。
它在应用层完成用户认证,依照安全策略检验授权。当安全策略授权时打开这个连接。这个连接后面的流量不再在应用层处理,而是进行状态检测。

(3)冗余

2、基本配置

配置完基本参数后,发现从PIX上可以ping通内网和外网的地址。但是内外网的主机无法相互ping通。内网主机无法ping通PIX外口。但是,内网主机可以访问外网的服务器。(可能原因:PIX默认关闭ICMP响应??)

基本配置命令:interface , nameif , ip address , nat , global , route
(1)激活以太端口
firewell#config terminal
(config)#interface ethernet0 auto
(config)#interface ethernet1 auto    !外口必须用命令激活

(2)命名端口和安全级别
(config)#nameif  ethernet1  inside  security0
(config)#nameif  ethernet0  outside  security100

(3)配置内外口
firewell#config terminal
(config)#ip address inside 192.168.1.1 255.255.255.0
(config)#ip address outside 222.20.16.1 255.255.255.0

(4)配置NAT和PAT
(config)#nat (inside) 1 0 0   !所有的内口地址都
(config)#nat (inside) 2 192.168.8.0 255.255.255.0   
(config)#global (outside) 2 10.1.30.150-10.1.30.160 netmask 255.255.0.0
测试配置:
  ping
  debug

(5)DMZ的访问

(6)转换表的操作
show  xlate    显示转换表的信息
clear  xlate     每次重建转换表要运行,以清除原有的转换槽,否则原信息将在超时(3小时)后才被丢弃
show conn    查找连接故障,为选择的特定选项显示所有活动的TCP连接的数量和状态
可以更改转换表的操作:
nat ,global ,static ,route,alias,conduit

(7)配置网络时间协议(NTP)
 NTP server与PIX的关系

(8)访问配置
经由PIX的入站访问

step1:静态网络地址转换
  静态网络地址转换,不节省已经分配的IP地址 
static [( prenat_interface,postnat_interface)] {mapped_address | interface} real_address [dns] [netmask mask] [norandomseq] [ max_cons [em_limit]]
   设定一个内部地址到一个外部地址的映射
   (config)#static (inside,outside) 211.70.96.10 10.1.100.10 netmask 255.255.255.255
   或者一个内部网络到一个外部网络的映射
   (config)#static (inside,outside) 211.70.96.0 10.1.100.0 netmask 255.255.255.0
   静态端口地址转换,不支持H.323或者多媒体应用流量
   static [(internal_if_name,external_if_name)] {tcp|udp} {global_ip | interface} global local_ip local_port [netmask mask] [ max_cons [emb_limit [norandomseq]]]



■思科ASA和PIX防火墙配置手册 第一章


■思科ASA和PIX防火墙配置手册 第一章
一、配置基础
1.1用户接口
思科防火墙支持下列用户配置方式:
Console,Telnet,SSH(1.x或者2.0,2.0为7.x新特性,PDM的http方式(7.x以后称为ASDM)和VMS的Firewall Management Center。
支持进入Rom Monitor模式,权限分为用户模式和特权模式,支持Help,History和命令输出的搜索和过滤。
注:Catalyst6500的FWSM没有物理接口接入,通过下面CLI命令进入:
Switch# session slot slot processor 1 (FWSM所在slot号)
用户模式:
Firewall> 为用户模式,输入enable进入特权模式Firewall#。特权模式下可以进入配置模式,在6.x所有的配置都在一个全局模式下进行,7.x
以后改成和IOS类似的全局配置模式和相应的子模式。通过exit,ctrl-z退回上级模式。
配置特性:
在原有命令前加no可以取消该命令。Show running-config 或者 write terminal显示当前配置,7.x后可以对show run 的命令输出进行搜索和
过滤。Show running-config all显示所有配置,包含缺省配置。Tab可以用于命令补全,ctrl-l可以用于重新显示输入的命令(适用于还没有
输入完命令被系统输出打乱的情况),help和history相同于IOS命令集。
Show命令支持 begin,include,exclude,grep 加正则表达式的方式对输出进行过滤和搜索。
Terminal width 命令用于修改终端屏幕显示宽度,缺省为80个字符,pager命令用于修改终端显示屏幕显示行数,缺省为24行,pager lines 0
命令什麽效果可以自己试试。
1.2防火墙许可介绍
防火墙具有下列几种许可形式,通过使用show version命令可以看设备所支持的特性:
Unrestricted (UR) 所有的限制仅限于设备自身的性能,也支持Failover
Restricted (R) 防火墙的内存和允许使用的最多端口数有限制,不支持Failover
Failover (FO) 不能单独使用的防火墙,只能用于Failover
Failover-Active/Active (FO-AA) 只能和UR类型的防火墙一起使用,支持active/active failover
注:FWSM内置UR许可。
activation-key 命令用于升级设备的许可,该许可和设备的serial number有关(show version输出可以看到),6.x为16字节,7.x为20字节

1.3初始配置
跟路由器一样可以使用setup进行对话式的基本配置。
——————————————————————————–
■思科ASA和PIX防火墙配置手册 第二章
二、 配置连接性
2.1配置接口
接口基础:
防火墙的接口都必须配置接口名称,接口IP地址和掩码(7.x开始支持IPv6)和安全等级。接口可以是物理接口也可以是逻辑接口(vlan),从6.3开始支持SPAN、trunk,但只支持802.1Q封装,不支持DTP协商。
接口基本配置:
注:对于FWSM所有的接口都为逻辑接口,名字也是vlan后面加上vlanid。例如FWSM位于6500的第三槽,配置三个接口,分别属于vlan
100,200,300.
Switch(config)# firewall vlan-group 1 100,200,300
Switch(config)# firewall module 3 vlan-group 1
Switch(config)# exit
Switch# session slot 3 processor 1
经过此配置后形成三个端口vlan100,vlan200,vlan300
PIX 6.x
Firewall(config)# interface hardware-id [hardware-speed] [shutdown] (Hardware-id可以用show version命令看到)
PIX 7.x
Firewall(config)# interface hardware-id
Firewall(config-if)# speed {auto | 10 | 100 | nonegotiate}
Firewall(config-if)# duplex {auto | full | half}
Firewall(config-if)# [no] shutdown
命名接口
FWSM 2.x
Firewall(config)# nameif vlan-id if_name securitylevel
PIX 6.x
Firewall(config)# nameif {hardware-id | vlan-id} if_name securitylevel
PIX 7.x
Firewall(config)# interface hardware_id[.subinterface]
Firewall(config-if)# nameif if_name
Firewall(config-if)# security-level level
注:Pix 7.x和FWSM 2.x开始支持不同接口有相同的security level,前提是全局配置模式下使用same-security-traffic permit
inter-interface命令。
配置IP地址
静态地址:Firewall(config)# ip address if_name ip_address [netmask]
动态地址:Firewall(config)# ip address outside dhcp [setroute] [retry retry_cnt]
注:setroute参数可以同时获得来自DHCP服务器的缺省路由,再次输入此命令可以renew地址。
PPPOE:Firewall(config)# vpdn username JohnDoe password JDsecret
Firewall(config)# vpdn group ISP1 localname JohnDoe
Firewall(config)# vpdn group ISP1 ppp authentication chap
Firewall(config)# vpdn group ISP1 request dialout pppoe
Firewall(config)# ip address outside pppoe setroute
验证接口
Firewall# show ip
IPv6地址配置(7.x新特性)
暂略
ARP配置
配置一个静态的ARP条目:Firewall(config)# arp if_name ip_address mac_address [alias]
配置timeout时间:Firewall(config)# arp timeout seconds 缺省为4小时
注:一般情况下使用clear arp会清除所有的ARP缓存,不能针对单个的条目,但是可以通过以下变通方法:配置一个静态的条目,映射有问题
的ip为一个假的mac地址,然后no掉该命令就会重新建立一个arp条目。
MTU和分段
配置MTU:Firewall(config)# mtu if_name bytes 使用show mtu (6.3) 或者show running-config mtu (7.x)来验证
分段(fragment)的几个命令:限制等待重组的分段数Firewall(config)# fragment size database-limit [if_name]
限制每个包的分段数Firewall(config)# fragment chain chain-limit [if_name]
限制一个数据包分段到达的时间Firewall(config)# fragment timeout seconds [if_name]
配置接口的优先队列(7.x新特性)
暂略
2.2配置路由
启用PRF防止地址欺骗 Firewall(config)# ip verify reverse-path interface if_name
配置静态路由Firewall(config)# route if_name ip_address netmask gateway_ip [metric]
配置RIP
被动听RIP更新(v1,v2)Firewall(config)# rip if_name passive [version 1] (Firewall(config)# rip if_name passive version 2
[authentication [text | md5 key (key_id)]])
宣告该接口为缺省路由Firewall(config)# rip if_name default version [1 | 2 [authentication [text | md5 key key_id]]
配置OSPF
定义OSPF进程 Firewall(config)# router ospf pid
指定相应网络到OSPF区域 Firewall(config-router)# network ip_address netmask area area_id
可选:定义Router ID Firewall(config-router)# router-id ip_address
记录OSPF邻居状态更新 Firewall(config-router)# log-adj-changes [detail]
启用OSPF更新认证 Firewall(config-router)# area area_id authentication [message-digest]
宣告缺省路由 Firewall(config-router)# default-information originate [always] [metric value] [metric-type {1 | 2}] [route-map
name]
调节OSPF参数 Firewall(config-router)# timers {spf spf_delay spf_holdtime |lsa-group-pacing seconds}
2.3 DHCP
配置成为DHCP Server:
配置地址池 Firewall(config)# dhcpd address ip1[-ip2] if_name  (最多256个客户端)
配置DHCP参数 Firewall(config)# dhcpd dns dns1 [dns2] Firewall(config)# dhcpd wins wins1 [wins2] Firewall(config)# dhcpd
domain domain_name Firewall(config)# dhcpd lease lease_length  Firewall(config)# dhcpd ping_timeout timeout
启用DHCP服务 Firewall(config)# dhcpd enable if_name
验证:show dhcdp, show dhcpd bindings, show dhcpd statistics
配置DHCP中继:
定义真实DHCP Server Firewall(config)# dhcprelay server dhcp_server_ip server_ifc(最多4个)
中继参数Firewall(config)# dhcprelay timeout seconds Firewall(config)# dhcprelay setroute client_ifc
启用中继 Firewall(config)# dhcprelay enable client_ifc
验证 show dhcprelay statistics
2.4组播的支持
暂略
——————————————————————————–
■思科ASA和PIX防火墙配置手册 第三章 
一、防火墙的管理
3.1 使用Security Context建立虚拟防火墙(7.x特性)
特性介绍:从PIX7.0和FWSM 2.2(1)开始,可以把物理的一个防火墙配置出多个虚拟的防火墙,每个防火墙称为context,这样一个防火墙就支
持两种工作模式:single-context和multiple-context,处于后者工作模式的防火墙被分为三个功能模块:system execution space(虽然没有
context的功能,但是是所有的基础),administrative context(被用来管理物理的防火墙) 和 user contexts(虚拟出来的防火墙,所有配置
防火墙的命令都适用)
配置:首先使用show activation-key来验证是否有multiple-context的许可,然后通过mode multiple和mode single命令在这两个模式之间进
行切换,当然也可以用show mode来验证现在工作在什麽模式下。在不同context下进行切换使用Firewall# changeto {system | context
name},由于所有的context的定义都必须在system execution space下,所以要首先使用changeto system转入该模式,Firewall(config)#
context name 接着要把物理接口映射到context中 只要这样才能在相应的context下显示出物理接口,从而配置其属性
Firewall(config-ctx)# allocate-interface physical-interface [map-name] 最后定义context 的startup-config的存放位置
Firewall(config-ctx)# config-url url  通过show context验证
注:当防火墙工作在multiple-context模式下,admin context就自动生成。(show context来验证)
由于所有的context都共享设备的资源,所以要限制各个context的资源分配
首先定义class Firewall(config)# class name 然后Firewall(config-class)# limit-resource all number% Firewall(config-class)#
limit-resource [rate] resource_name number[%] 最后在相应的context配置下Firewall(config-ctx)# member class
通过以下命令验证 show class, show resource allocation, show resource usage等
注:缺省telnet,ssh,IPsec 5 sessions,MAC address 65535条目
3.2 管理Flash文件系统
6.x文件系统
只有六种文件可以保存到Flash,没有文件名只有代号,没有目录结构
0 OS镜像 1 启动文件 2 VPN和密匙证书 3 PDM镜像 4 崩溃信息 5 0的文件大小
show flashfs 显示flash文件
7.x和FWSM文件系统
7.x和FWSM更像IOS的文件系统,具有层级目录,要被格式化后才可以使用,7.x使用flash:/代表Flash文件系统,FWSM分别使用flash:/ (系统
镜像)和disk:/(配置文件)
由于该系统使用类Unix的指令,所以可以使用下列常用命令来对该文件系统操作:
dir pwd cd more delete copy rename mkdir rmdir format erase fsck(检查文件系统完整性)
6.x在Flash裡面只能保存一个系统镜像,7.x则废除了此种限制通过使用Firewall(config)# boot system flash:filename来选取不同的系统镜
像,show bootvar进行验证
OS升级 见附录
3.3 管理配置文件
7.0以后可以使用多个启动配置文件Firewall(config)# boot config url
显示启动配置文件Firewall# show startup-config Firewall# show configuration (6.x为show configure)
保存当前配置文件 write memory, copy running-config startup-config, write net [[server-ip-address]:[filename]] (7.x也支持copy
至tftp)
强制standby同步当前配置文件 write standby 删除启动配置文件 write erase
合併启动配置文件为当前配置文件 configure memory 从Web导入配置文件configure http[s]://[user:password@]location[:port]/
http-pathname  (7.x支持copy自以上源)
合併配置文件自自动更新服务器
Firewall(config)# auto-update device-id {hardware-serial | hostname |
ipaddress [if_name] | mac-address [if_name] | string text}
Firewall(config)# auto-update server http[s]://[username:password@]
AUSserver-IP-address[:port]/autoupdate/AutoUpdateServlet
[verify-certificate]
3.4 管理管理会话
Firewall(config)# console timeout minutes 配置console登录的超时(缺省0不超时)
禁止来自outside端口的telnet,启用telnet Firewall(config)# telnet ip_address netmask if_name   Firewall(config)# telnet
timeout minutes  配置telnet超时
启用SSH配置
首先生成RSA密匙对 Firewall(config)# domain-name name Firewall(config)# ca generate rsa key [modulus] (7.x 使用crypto key
generate rsa general-keys [modulus modulus]) Firewall(config)# ca save all (7.x自动保存)
使用show ca mypubkey rsa来验证(7.x show crypto key mypubkey rsa) ca zeroize rsa作废原有密匙对(7.x crypto key zeroize rsa
default)
最后允许ssh会话 Firewall(config)# ssh ip_address netmask if_name
ssh version命令可以选择ssh的版本,ssh timeout定义超时时间
PDM/ASDM配置
由于PDM存放位置固定,所以不需要指定镜像的位置,ASDM使用Firewall(config)# asdm image device:/path 来指定镜像位置,如果没有可以
使用copy命令来安装。然后配置访问许可Firewall# http ip_address subnet_mask if_name 启用HTTP进程Firewall# http server enable 使
https://ip-address/admin来访问。
Banner配置 Firewall(config)# banner {exec | login | motd} text  对banner不能修改,只能用no来删除,或者clear banner来清除所有
的banner(7.0 clear configure banner)
监控管理会话 who监控telnet会话 kill telnet-id来清除会话,show ssh sessions监控ssh会话,ssh disconnect session-id清除ssh会话,
show pdm sessions监控pdm会话,pdm disconnect session-id清除pdm会话
3.5 系统重启和崩溃
通常使用reload命令重启系统,从7.0以后支持在特定的时间重启系统Firewall# reload at hh:mm [month day | day month] [max-hold-time
{minutes | hhh:mm}] [noconfirm] [quick] [save-config] [reason text]或者经过一定的时间间隔后重启Firewall# reload in {minutes |
hh:mm} [max-hold-time {minutes | hhh:mm}] [noconfirm] [quick] [save-config] [reason text]
启用崩溃信息生成 Firewall(config)# crashinfo save enable (7.0 no crashinfo save disable) show crashinfo 来看崩溃信息 clear
crashinfo删除信息(FWSM使用crashdump)
3.6       SNMP支持
系统SNMP信息 Firewall(config)# snmp-server location string (contact string)
SNMP访问许可 Firewall(config)# snmp-server host if_name ip_addr [poll | trap]
Firewall(config)# snmp-server community key
——————————————————————————–
■思科ASA和PIX防火墙配置手册 第四章 
四、用户管理
4.1 一般用户管理
注:缺省情况下认证用户仅需要password,这样的一般用户缺省用户名就是enalbe_1,在ssh情况下缺省用户名就是pix,然后用password来认证

非特权模式密码配置 Firewall(config)# {password | passwd} password [encrypted] (恢复缺省密码cisco 用clear {password | passwd})
特权模式密码配置 Firewall(config)# enable password [pw] [level priv_level] [encrypted]
4.2 本地数据库管理用户
定义用户 Firewall(config)# username username [{nopassword | password password}
[encrypted]] privilege level
启用本地认证 Firewall(config)# aaa authentication {serial | telnet | ssh | http} console LOCAL
注:缺省情况特权模式密码使用enable password定义,这样用户通过认证后使用enable来进入特权模式,而不管用户初始什麽等级的权限,所
有用户使用相同的密码。这裡也可以使用本地enable认证(aaa authentication enable console LOCAL),用户使用username password的密码
来进入enable,用户enable密码独立从而增加安全性。
本地授权:Firewall(config)# aaa authorization command LOCAL
配置命令的特权等级:Firewall(config)# privilege {show | clear | configure} level level [mode {enable | configure}] command
command
使用show privilege来看当前命令的特权等级(7.x使用show run all privilege)
4.3 使用AAA服务器来管理用户
定义AAA服务器组和协议 Firewall(config)# aaa-server server_tag protocol {tacacs+ | radius} (7.x还增加了kerberos,ldap,nt,sdi协
议的支持)
加入服务器到组 Firewall(config)# aaa-server server_tag [(if_name)] host server_ip [key] [timeout seconds]
可选命令
定义服务器失败阀值 FWSM Firewall(config)# aaa-server server_tag max-attempts number
PIX 6.x Firewall(config)# aaa-server server_tag max-failed-attempts number
PIX 7.x Firewall(config-aaa-server-group)# max-failed-attempts number
定义统计策略(7.x特性) Firewall(config-aaa-server-group)# accounting-mode {single | simultaneous}
具体各协议参数配置暂略
4.4 配置AAA管理用户
启用鉴权 Firewall(config)# aaa authentication {serial | telnet | ssh | http} console
server_tag [LOCAL]
启用授权 Firewall(config)# aaa authorization command server_tag [LOCAL]
启用统计 Firewall(config)# aaa accounting command [privilege level] server_tag
注:AAA服务器配置略
4.5 配置AAA支持用户Cut-Through代理
4.6 密码恢复
——————————————————————————–
■思科ASA和PIX防火墙配置手册 第五章
五 防火墙的访问控制
5.1 防火墙的透明模式
特性介绍:从PIX 7.0和FWSM 2.2开始防火墙可以支持透明的防火墙模式,接口不需要配置地址信息,工作在二层。只支持两个接口inside和
outside,当然可以配置一个管理接口,但是管理接口不能用于处理用户流量,在多context模式下不能复用物理端口。由于连接的是同一地址
段的网络,所以不支持NAT,虽然没有IP地址但是同样可以配置ACL来检查流量。
进入透明模式 Firewall(config)# firewall transparent (show firewall 来验证当前的工作模式,由于路由模式和透明模式工作方式不同,
所以互相切换的时候会清除当前配置文件)
配置接口 Firewall(config)# interface hardware-id
Firewall(config-if)# speed {auto | 10 | 100 |nonegotiate}
Firewall(config-if)# duplex {auto | full | half}
Firewall(config-if)# [no] shutdown
Firewall(config-if)# nameif if_name
Firewall(config-if)# security-level level
注:不用配置IP地址信息,但是其它的属性还是要配置的,接口的安全等级一般要不一样,same-security-traffic permit inter-interface
命令可以免除此限制。
配置管理地址 Firewall(config)# ip address ip_address subnet_mask
Firewall(config)# route if_name foreign_network foreign_mask gateway [metric]
MAC地址表的配置 Firewall# show mac-address-table 显示MAC地址表
Firewall(config)# mac-address-table aging-time minutes 设置MAC地址表过期时间
Firewall(config)# mac-address-table static if_name mac_address 设置静态MAC条目
Firewall(config)# mac-learn if_name disable 禁止特定接口地址学习(show mac-learn验证)
ARP检查 Firewall(config)# arp if_name ip_address mac_address 静态ARP条目
Firewall(config)# arp-inspection if_name enable [flood | no-flood] 端口启用ARP检查
为非IP协议配置转发策略 Firewall(config)# access-list acl_id ethertype {permit | deny} {any | bpdu | ipx | mpls-unicast |
mpls-multicast | ethertype}
Firewall(config)# access-group acl_id {in | out} interface if_name
5.2 防火墙的路由模式和地址翻译
特性介绍:从高安全等级到低安全等级的访问称为outbound访问,需要配置地址翻译和outbound访问控制,PIX缺省情况下不用配置ACL就允许
此类访问,FWSM则需要配置ACL来允许此类型的访问。而从低安全等级到高安全等级的访问称为inboud访问,也需要配置地址翻译和inboud访问
控制,此类型必须配置ACL.同一安全等级的访问也可以配置地址翻译。
支持下列几种NAT类型

Translation Type
Application
Basic Command
Direction in Which Connections Can Be Initiated
Static NAT
Real source addresses (and ports) are translated to mapped addresses (and ports)
static
Inbound or outbound
Policy NAT
Conditionally translates real source addresses (and ports) to mapped addresses
static access-list
Inbound or outbound
Identity NAT
No translation of real source addresses
nat 0
Outbound only
NAT exemption
No translation of real source addresses matched by the access list
nat 0 access-list
Inbound or outbound
Dynamic NAT
Translates real source addresses to a pool of mapped addresses
nat id
global id address-range
Outbound only
PAT
Translates real source addresses to a single mapped address with dynamic port numbers
nat id
global id address
Outbound only
配置
对于连接数的控制 PIX 6.x … [norandomseq] [max_conns [emb_limit]]
PIX 7.x  … [norandomseq] [[tcp] max_conns [emb_limit]] [udp udp_max_conns]
连接超时控制 Firewall(config)# timeout [conn hh:mm:ss] [udp hh:mm:ss]
静态NAT
基于地址的静态翻译 Firewall(config)# static (real_ifc,mapped_ifc) {mapped_ip | interface} {real_ip [netmask mask]} [dns]
[norandomseq] [max_conns [emb_limit]]
基于端口的静态翻译 Firewall(config)# static (real_ifc,mapped_ifc) {tcp | udp} {mapped_ip | interface} mapped_port {real_ip
real_port [netmask mask]} [dns] [norandomseq] [max_conns [emb_limit]]
策略NAT
定义翻译策略 Firewall(config)# access-list acl_name permit ip real_ip real_mask foreign_ip foreign_mask
静态的 Firewall(config)# static (real_ifc,mapped_ifc) mapped_ip access-list acl_name [dns] [norandomseq] [max_conns
[emb_limit]]
NAT的 Firewall(config)# global (mapped_ifc) nat_id {global_ip [-global_ip] [netmask global_mask]} | interface
Firewall(config)# nat (real_ifc) nat_id access-list acl_name [dns] [outside][norandomseq] [max_conns [emb_limit]]
Identify NAT Firewall(config)# nat (real_ifc) 0 real_ip real_mask [dns] [norandomseq] [max_conns [emb_limit]]
注:nat 0和static 相同地址的区别在于:nat 0只能用于outbound访问,static两种访问都可以,对同一地址不建议同时配置此两类命令。
NAT Exemption
Firewall(config)# access-list acl_name permit ip local_ip local_mask foreign_ip foreign_mask
Firewall(config)# nat (real_ifc) 0 access-list acl_name [dns] [outside] [max_conns [emb_limit] [norandomseq]]
注:此类型NAT策略只能根据源和目的地址不能根据协议类型或者端口
动态地址翻译
定义NAT的映射地址 Firewall(config)# global (mapped_ifc) nat_id global_ip[-global_ip] [netmask global_mask]
定义PAT的映射地址 Firewall(config)# global (mapped_ifc) nat_id {global_ip | interface}
定义翻译策略 Firewall(config)# nat (real_ifc) nat_id real_ip [mask [dns] [outside] [[norandomseq] [max_conns [emb_limit]]]
注:也可以使用ACL来做类似的策略NAT。
5.3 使用ACL进行访问控制
特性介绍:防火墙的ACL配置跟IOS不同,子网掩码部分为正常的子网掩码不需要使用反