Monday, February 25, 2013

Moving from single HD to RAID1


Well after thinking all was good and working, during the week again I started to see the same error messages at boot. That is wrong.

So I decided to get rid of that HD once and for all.

The original plan was to setup an asymetric RAID 1 array with a new HD and the old one but it looks like a bad idea right now.

The hardware

First of all I had this two drives:
Original disk 1.5TB 32MB cache, 7200 512/512 (logical/physical)
New disk     2.0TB 64MB cache, 7200 512/4096 (logical/physical)
So I went to the store and got another HD of the same model of the new disk so I can get an efficient easy to setup and manage RAID1 array without any "special cases".

Procedure

The Archlinux Wiki have a really good article regarding to Convert a single drive system to RAID and as always, it is a good idea to follow it. Most of what I did was following that tutorial almost verbatim so I will not try to reproduce it here. I will just mention some of the variations I used to setup my box.

The original disk have the partitions as follows:
/dev/sda1 NTFS
/dev/sda2 /boot
/dev/sda3 /
/dev/sda4 /home
/dev/sda5 swap

To windows or not to Windows?

As you can see, I had an NTFS patition, there I had installed Windows with the only purpose to be able to play StarCraft II with all the power of my graphic card. I only boot that partition to play and fight with the annoying Windows updates that keeps failing keeping me away from playing when I want.

After searching a little in the web I found that it is reported to actually run very well from Wine. I found the Wine page and also an article from the ArchWiki.

I have been told (and it makes a lot of sense because it is a broken system) that Windows doesn't like to be moved from one HD to another, it will detect the different device and bitch all the way up to force me to do a fresh install, which will break all my partitions and provide me with a full dose of pain.

All said, I decided to not install a Windows partition and use that space in disk for something more useful than storing Windows (like being there just empty).

Prepare the disk

First of all, I needed to decide how I wanted to partition my disk. I decided to keep the partition schema for my base system as it was, with /boot, / and /home split and adding the 500GB extra from this new drive to the /home partition.

I noticed that I didn't had a partition to install and play with other distributions or OS (I have an eye on ArchBSD and FreeBSD as a way to get away from Lennart's rule). So I will change the NTFS partition o another ext* one.

That said, it is time to setup the partitions...
But it looks like there is an option to MBR partition scheme: GPT. WTF is it and, do I want it?

TL;DR, GPT does not have the 4 primary partition limit that MBR has. And after asking the ArchWiki  about Choosing between GPT and MBR it ends up It doesn't matter since I will not boot Windows nor use Grub Legacy as boot loader, plus I want more than 4 partitions without being bitched at some legacy problems.

So I decided to use GPT, all in all, it was a pleasant the bigger difference was using the GPT tools which are the same as the good old ones but with a nice 'g' in there: gdisk, sgdisk, cgdisk. They all are part of the gptfdisk package

Setup the RAID1

After having the disk setup done, I needed to follow the tutorial and wait for A LONG time to rsync my partitions.

Once again, I needed to execute the testdisk && fdisk trick from the last post to get working one of my partitions that keeps breaking up. And after that, all worked just flawlessly.

Setting up Grub2

I guess it worth to mention that Grub2 was a little tricky to install, not because of the GPT particularities. but because Grub2 is too different to Grub (legacy) which was really straight forward to configure by editing the menu.lst file.

All you have to do is:
GRUB_PRELOAD_MODULES="mdraid"
  • Tell Grub2 to load the GPT support by adding  the "part_gpt" module.
GRUB_PRELOAD_MODULES="part_gpt mdraid"

Success

Once this little changes are done, it is all about following the tutorial and wait a lot of time for 2TB to sync. =)

Today I do have my system running over an RAID1 array make my data a little more "secure" against HD failures.

Now I need to figure out what can I do with a 1.5TB HD with some broken blocks that I don't know if will fail again soon. Any ideas?

Sunday, February 10, 2013

Bad blocks whatcha gona do when they come for you

This is an horror story about a hard drive failing and keeping my data away from me.

This post got a lot bigger than what I expected, TL;DR: Broken HD, testdrive + fsck will save your data from broken blocks which corrupt your superblock.

Imagine you went to the cinema and when you get back you want to do a nice pacman -Syu and look if the new KDE 4.10 have landed in the repos. You naturally go and power on your desktop and what appen next is this, at boot time just after udev start trying to trigger events.

ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x24
ata2.00 failed cmmand READ DMA EXT
....
ata2.00 status {DRY ERR}

ata2.00 errro {UNC}

Ok, something is wrong with the hard drive. It can't be that bad since I have been using the computer just few hours ago and turned off normally.

Start debugging

Let's try to boot the good old Archlinux 2010 install cd and see what happens. It turns out that booting from the livecd I got just the same error. Mmm that is wrong. AFAIK the live cd should not touch the hard drive until I want to mount it.

Since I could not boot from livecd, I was wondering if I can see my data partition from my Windows system (I have dual-boot system to be able to play Starcraft2, Blizzard, please free me and give us a linux native game!). I knew it was a long shot since, if linux can't boot, sure Windows will never boot too. Still I was out of options so I tried it. Surprisingly, it worked, Windows booted just normally and I was able to see my data partition with the ext3 driver. That is really odd. Still, after browsing a little my files I got a crash on the ext3 driver and it got closed, no problem now I know I can get my files someway and more importantly, they are alive!

I have started thinking that it must be a software error, something wrong on the kernel. So I went back to the live cd approach, come on, I should be able to boot my live cd. I went into a loop of boot, see if something is wrong in the bios, sometimes change the boot parameters as I read some recommendations in the forums, try to boot the live cd, see the errors, try again. ircpool, libata, acpi parameters, noapic. etc. Anything that avoid reading the HD and let me boot the livecd.

Fortunately, in one of the iterations I was lazy and waited too much (like 5-10 mins) before restarting the machine and PUM ! the live cd booted. I still get the kernel errors in the buffer but I also can see the good old rc script running and initializing the sistem. Nice, now I know I can boot my computer.

Now it is time to real debugging

So I went and downloaded a newer version of Archlinux ISO (2013-02-01) so I can have the latest version of every tool and have a nicer resolution since the old ISO predates the KMS.

Let's try to figure out what all this output mean.

My first approach was to search for the exception code. My surprise is that the good "exception Emask 0x0 SAct 0x0 SErr 0x0 action" is a very generic error message. I have found it with a lot of variations all over the internet, but none of them helped me to fix my issue.

I found this great page from the libata guys Libata error messages which explains exactly what does it means all bits in the error message. So I realized that the DRDY was a good thing, the drive was ready but the ERR one means (yeah, you can guess) there is an error set in the registers. The error was, as the UNC code this one means "Uncorrectable error - often due to bad sectors on the disk". We are fried now, my HD is broken and I lost all my information. At least, now I know the problem

In a forum, I saw that there is a nice tool that tells you if there are broken blocks on your HD, so I tried it, badblocks /dev/sda and after 4:30hrs~ I got the answer: 16 bad blocks.
"Bad blocks bad blocks, whatcha gona do, whatcha gona do when they come from you"
Then I tried to run smartctl tests to see if it can fix the issue or give me more information and it will take just... like 5-6 hrs to complete the test, what a pain. Not having anything else in mind, I ran the test  smartctl -t long and went to see a serie for a while. I setup a nice whatch command to show me the output of smartctl -l selftest and see the progress.

From the output of smartctl I figured out that it was showing me which block whas the last found broken block, and after comparing it with my fdisk -l output I noticed that it was the /dev/sda3 offset + 2. Holly! This must mean something.

Let me recap a little here; I have my disk patitioned in this way.
/dev/sda1  NTFS
/dev/sda2  /boot
/dev/sda3 /
/dev/sda4 /home
I was able to Boot Windows from the NTFS partition, nice.
I was able to mount /dev/sda2 and see the content, nice.
I was not able to mount /dev/sda3 nor /dev/sda4, and the broken block was just at the beginning of sda3.
Stuff started to show some trend.

I tried debugfs to read the block and let the HD firmware to blacklist it but it keeps failing with a weird error:
debugfs: open /dev/sda3
/dev/sda3: Bad magic number in super-block while opening filesystem
What does that mean? Well I had no idea. The forums says that I may want to try fsck, let's do it, it can be that bad. And the good old fsck fails with another weird error:
fsck.ext3: Attempt to read block from filesystem resulted in short read while trying to open /dev/sda3
Could this be a zero-length partition?

This should be wrong, I have just fdisk -l it, I know it is a complete partition.

forum post mentioned a tool named "testdisk" as a hopeless guy, I ran to man and look what is that tool. The description says "Scan and repair disk partitions" that sounds useful. After the small man I decided to try it out: testdisk /dev/sda3 and Magic! it is able to tell me what was going wrong with my partition, it figured out the copies of the broken superblock and it even tell me the exact command I need to fix my problem:
fsck.ext3 -b <blockaddress> -B <blocksize>.

Ran that nice and beautiful command and I suddenly I got my data back!

Conclusion

What I have learnt here, first of all was:

  • YOU DO MUST HAVE BACKUPS.
  • Testdisk is your friend.
  • fsck is your friend.
  • Archlinux ISO is your friend.
  • *nix is your friend.
  • Hard drives tend to break.

Also, It is important to notice the wide variety of tools we have to help us get our data back. Not always a small physical breakage is the end of the world, you can recover it if you have the patience to read a ton of forums, man pages and dedicate some time to the adventure.

I was happy to learn that ext3 have this redundant structure (copies of the superblock all over the fs) to  help us to recover from breakage. I love it. I don't know how other fs do the trick but I am really happy I am using ext3.

Finally, I would like to thank the Archlinux team for give me a really powerful and nice livecd to help me in this painful trip.

Now I have my system back, I can hear my music and use my files. It is time to setup a redundancy plan to avoid getting panic again due a bad hard drive. But that will be the project for next week.

Tuesday, March 27, 2012

Libpurple in GSoC 2012

Libpurple was accepted in the Google Summer of Code  this year 2012.

I urge every student reading this to apply for any of the projects accepted and if you like, apply to Libpurple.

We have a set of proposed ideas but you are encouraged to bring your own ideas since they will be fresher and will not compete with other people over the same project.

You can find libpurple's application page at Pidgin, Finch and libpurple.

Wednesday, May 11, 2011

Simulating mixed language HDL using VCS

I needed to port some modelsim do files to this new simulator so I found out that the documentation available is not as friendly as I would like. Finally got to get the simulation working and I want to archive it somewhere it can help me or someone else in the future.

This little tutorial is supposed to be dynamically updated when I feel that more info is needed or find errors in it.

VCS is a simulator from Synopsys which is known to be far superior to Xilinx ISim. It support multiple languajes such as the most popular Verilog, VHDL, SystemVerilog.

General workflow
The general workflow when simulating with VCS consist of the following steps.
  • Compile/Analize
  • Elaborate/Build
  • Simulate
First, you need to compile each and every HDL files you have in your design including the testbench. This is done with different command lines such as
  • vhdlan: The compiler for VHDL files
  • vlogan: The compiler for Verilog and SystemVerilog files.
Both commands accept the flag -f filelist where "filelist" is a list of files to be compiled. This help a lot to simplify and structure the compilation scripts.

VHDL Compilation/Analysis

VHDL uses libraries to organize code, getting vhdlan to compile them is not straight forward since vcs needs to map them to some directory and then link them.

To achieve this you must create a directory with the name of each the library in your pwd to be able to map the libraries to a physical directory. The way to tell vcs how to map each library to the directories a special file is needed: .synopsys_vss.setup. This file can be on your VCS instalation path, in your $HOME or in your pwd, vhdlan will look for the file in this particular order.

The syntax of this file is somehow easy, you first need to map the WORK library to a name, which then must be maped to a physical directory, after that, each library must be mapped to a physical directory on each line.

In the following example, there are two libraries, MY_LIB with some modules of my own and UTIL_LIB which have util modules designed over the time.
WORK > DEFAULT
DEFAULT : ./work
MY_LIB : ./MY_LIB
UTIL_LIB : ./UTIL_LIB
This is a simple command line used to compile VHDL files with libraries
vhdlan -work <library_dir> -f <filename_of_file_list>

Verilog Compilation/Analysis

Verilog doesn't uses libraries so there is not need to do tricks with the libraries. Still it's useful to know some tricks about this complier.

vlogan have some useful flags that helps to structure the code and maintain isolated the simulation environment to the development one.

  • +incdir+: Specify the path where vlogan will look for the files to compile.
  • +define+: Define a text macro at compile time.
  • +v2k: Enables the use of Verilog Standard 2001
  • -svlog or -sverilog: Enables the analysis of SystemVerilog code.

This is the simple command line used to compile Verilog files using 2001 standard and a SystemVerilog test bench.
vlogan +v2k +incdir -f <filename_of_file_list>
vlogan +v2k -sverilog +incdir -f <filename_of_file_list>
 vlogan write it's output in a directory AN.DB which can be deleted in a cleanup process to keep workspace clean.

Elaboration/Build

Once every file needed in a design is compiled, now it is time to elaborate the executable binary. The command to elaborate is vcs which take as parameter the top module to be simulated, usually the top module of the testbench.

The command to elaborate is:
vcs -debug_all glbl
where the flag -debug_all tell the tool to enable the simulation GUI and the necessary debug information to add breakpoints and line stepping. The glbl argument is needed to use Xilinx components.

Simulation

The elaboration command generates an executable file with the name of simv which must be executed to start the simulation. The default behavior of this executable is to run and output messages from the test bench to stdout. Normally what is needed is to get a GUI where to see the waves and analyze the signal values at each time, this is done with the -gui parameter.

The command to execute the simulation with a GUI is:
./simv -gui
Conclusion

This is the basic workflow needed to simulate a design in VCS, each of the tools have a lot more parameters that can be used to get specialized behavior when needed. All of them come with the documentation of the tools through the manuals or the -h parameter.

[1] VCS and coverage by Aviral Mittal
Update: This is the original link to the article I found useful
Update: Fix some escaped out <info> comments.

Monday, March 07, 2011

Split large LaTeX files

When your paper/report is getting too large it becomes a little complicated/frustrated to maintain it in one big file.

It's possible to split a big .tex file and setup a hierarchical file tree with a small portion of the text on each file.This is achieved using the \include directives.

There are three main LaTeX commands that manage multiple input files.

  • \includeonly which specifies a list of files that will be included by the \include command. If this command exists and file in \include is not listed here, it will not be included
  • \include as it's name says, it include a file in a new page. Used with \includeonly, it can include files selectively. Note: This command can't be nested
    • It's equivalent to \clearpage \input{file} \clearpage
  • \input This is the most simple include scheme and it is equivalent tot a plain C's #include
So your big file 
\section{foo}
% lot of text, figures and equations
\section{bar}
%lot of text and subsections>
can be simplified as
\include{foo}
\include{bar} 
where there is a foo.tex an bar.tex sub-files containing the section text.

If you want to get another layer of simplifications, it's possible to just use \input in the sub-files.

[1] http://www.kfunigraz.ac.at/~binder/texhelp/ltx-165.html

Monday, November 01, 2010

MSNP16 and SLP-rewrite merged

I have just pushed the revision that merges my MSNP16 and SLP branches to the main development branch in pidgin. I'm very happy to have this branches merged since they represent almost all the code I have been writing on the last year.

Yes I have started coding MSNP16 support almost a year ago and it took a lot of effort, reverse engineering, debugging Wireshark dumps and a lot of pidgin debug logs to get it working. That is a lot of time!

It is true that the MSNP16 code was almost complete when I started my SoC work but I though it would be better to start the SLP rewrite over the MSNP16 branch to be able to easily test both codes at the same time and try to get it in a better shape before merging it to i.p.p.

I know I have announced this merge like two weeks ago, but you know, I wanted this merge to be followed by a reasonable "beta" testing before being released and at that time it got that we had an security issue and needed to release 2.7.4. Once it was out, there were some ICQ issues that needed a quick release to fix that bugs, so we got a 2.7.5. Now I was able to merge and get a normal release cycle to get beta testers to find bugs in this new and nice code.

I hope this code will fix more issues than it brings up, specially the ones related to data transfer. Since most of the code on this area have changed due DirectConn and SLP-rewrite, I guesss it would be a good idea to review and close most of the related tickets since the traceback and debug output would be really useless now. Yei for smashing tickets!

I hope you all like 2.7.6 when it get released!

Monday, October 25, 2010

Use ImageMagick to convert a set of images

While reporting my experiment output from LTSpice I need to save the plots I get as wmf (because it's the only supported image format in this software) then change the format to png to be easily loaded in my latex file.

To achieve this goal I have used ImageMagick. ImageMagick is a really powerful set of tools to manipulate images in command line which allow me to just type a easy command to get my png ready to be used in my tex file.

I have been using the convert command:
convert foo.wmf foo.png

Today I needed to convert a bunch of images so using convert would be a little painful. A quick google show me the answer to my problem: mogrify

With mogrify command you can change the format for every image you tell it in your shell. So to modify all wmf images in a directory now I just need execute:

mogrify -format png *.wmf

And that's it. I hope some of you find it useful.