Archive

Archive for November, 2008

Embedding NULL in a C string

November 18th, 2008

I just solved this interesting little problem for myself, and it took me a little while (not that long, but long enough) to get it right and straight in my head. The problem at hand is this: suppose you want to embed a NULL character (for whatever reason) into a C string. That’s a bit of a quandary, since C uses NULL (a.k.a. ASCII 0, or \0) for string termination. So if you try something like:

printf("This string is\0awesome!\n");

You’ll be left in the cold wondering what this string is, because you’ll never see anything after the “is.” That’s right, you’ll never get to know that this string is, in fact, awesome. That’s rather annoying if, for example, you want to use NULL as a delimiter for something. One might want to do this since NULL characters are disallowed in things like file names (since they terminate them). So the question is “can we fix it?” And much to my own glee, the answer is “yes we can!”

My first attempt was to encode a %s conversion specifier in there and make its value “\0″, like so:

printf("This string is%sawesome!\n", "\0");

It’ll print out the full string with that invisible NULL, right? Unfortunately, this is incorrect. It will print the full string, but not the NULL. Why? Because the %s will look at its corresponding string, cut off the NULL character, and insert it. So it puts in a string of length 0 in there. So it looks like nothing has been put into the string, because it has. But nothing != NULL. Back to the drawing board.

My next attempt was to encode a %c conversion specifier into the string and make its value ‘\0′, like so:

printf("This string is%cawesome!\n", '\0');

At first, it seemed to work, but I was dismayed. All I saw was the output of the original try (“This string is”). I guess that’s what I wanted, right? But it appears it didn’t do anything with the rest of the string. What to do? Well, I turned to printf’s close cousin, sprintf, to put the value into a string for later manipulation and printing. So I malloc’d some memory for a char *mystr, like so:

char *mystr = malloc(64); // arbitrary amount of memory

Then I shoved my string into mystr:

sprintf(mystr, "This string is%cawesome!\n", '\0');

Then I put in a line to print mystr:

printf("%s\n", mystr);

And… failure? Damn. I must be doing something wrong… Then it hit me: printf() will always stop at the NULL character, but that doesn’t mean my string isn’t all there. The key here is that I allocated enough memory for that string, so I’ll be damned if my whole string isn’t there. To test my theory, I modified my line like so:

printf("%s", mystr+15);

For those of you who aren’t all that up on your C, since mystr is simply a pointer to a location in memory to the beginning of my characters, mystr+15 is that location plus 15 bytes, which is one character after the inserted NULL byte. As I expected (and hoped), “awesome!” (with a newline) was printed! Success! Now, you might think back to my first attempt and wonder if that would work with sprintf, but it does not. It’s probably for the best anyway, since %c makes more of a point that you’re doing something despicable like puttying NULL characters into your strings, and allows for easier changing if you decide to use something else in the future.

So a final question comes up: if I’m shoving the character that’s used for string termination in my strings, how do I know when it’s really done? Well that’s a matter for a particular implementation. In mine, the newline character will be my buddy. For another implementation, one might decide to use calloc() instead and just look for more than one NULL character in a row. Either way it’s rather messy, but for my purposes I think it’ll work just fine.

Related Links:

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

programming ,

equery and q

November 13th, 2008

As both an experienced user of Gentoo and a lover of the command line, I often find myself querying portage for various things. Sometimes, if it’s in a particular package, I go looking at the ebuild itself. Others, I run a find command in /usr/portage. However, there are tools to take care of all these silly things in a much more elegant fashion. Namely, there’s equery and q. Since I’ve only now realized that they both not only exist and are useful, I figure there must be someone else who could benefit from this knowledge as well. In this post, I’ll briefly describe the uses and benefits of each, as well as why they should co-exist on your system, at least for the time being.

Commonalities and Usage

First things first, equery and q are both very similar both in offered functionality and usage from the command line. Basic usage goes like so:

$ equery [command] [package]
or
$ q [command] [package]

Simple, right? There are also options to modify usage, but this is the most basic and common usage pattern.

In terms of functionality, equery and q have the following in common:

Function equery command q command
List packages owning file equery b q file
Verify integrity of package equery k q check
List dependencies of package equery d q depends
List files owned by package equery f q list
List all packages with USE flag equery h q use
List all packages matching search equery l q search
Show size of files in package equery s q size

The usage of each of the above commands is pretty straightforward, so I won’t bore you with details. Running just “equery” or “q” from the command line will show all basic usage, and running one of the above commands without arguments (or with a -h local argument, for equery) will show similar usage info for the specific command.

equery

equery is a part of the gentoolkit package, is written in Python, and is rather well-endowed (in terms of features, of course). In terms of features unique from q, equery boasts:

  • depgraph: display a dependency tree for a given package
  • uses: display USE flags for a given package
  • which: print full path to ebuild for a given package

Personally, of these three, I’ve only ever used uses, since I the few times I’ve ever attempted to use depgraph, the results have been too big to really get a handle on. Either way, the uses command makes it a lot easier to find out what USE flags are available for a particular package as well as their current states. Of course, you could just do a (not so quick):

$ emerge -pv [package]

However, that won’t give the information on what the USE flags actually are, just what their status is on the package.

equery in general gives more detail and nicer output than its shorter-named counterpart. Since I always like to see screen shots (yes, even from command-line programs), I’ll take the liberty of doing just that to illustrate my point:

Output of equery list command.

Output of equery list zsnes command.

Output of qsearch zsnes command.

Output of qsearch zsnes command.

One thing to note about this particular command is that while qsearch automatically searches both installed packages and those in the portage tree, equery requires the -p option, as shown above, to do the same thing. On the flip side, qsearch has no capability (at this time) of searching overlays, but equery can be made to do so with the -o option. Tradeoffs, tradeoffs!

As a final note on equery, there are in fact two ways to call each of the sub-programs (like list, depgraph, etc.). There’s a short and long option for each of them, which is rather convenient. “equery l” is much nicer than “equery list” and “equery g” is way better than “equery depgraph.”

q

On to q! After reading the above section, you might wonder why anyone would want to use q when they’ve got equery in their toolbox. Is it for those few features that q has that are so dazzling? Is it because the name is shorter? No, no, let me just show you, you’ll understand:

Timed output of equery list zsnes.

Timed output of equery list zsnes.

Timed output of qsearch zsnes.

Timed output of qsearch zsnes.

For those of you who can’t see the images or are just in plain shock, let me spell it out: q is fast. In that particular query, about 34 times faster. 34 times faster! That makes a big difference, whether you’re sitting in front of the keyboard twiddling your thumbs or putting it in a shell script. As a matter of fact, on running just the q or equery commands alone (to show the helpful usage messages), the speed difference is over 500 times! That being said, if you don’t need the fancy formatting and extra frills of equery for a given task, just use q. It’s faster. According to its Gentoo page, that’s its purpose anyway:

portage-utils is a collection of very fast utilities written in C, which are meant to offer a faster but more limited alternative to their gentoolkit counterparts. Please note that portage-utils is not meant to replace gentoolkit. Its utilities are much more efficient than the equivalent ones from gentoolkit and might be better suited to be used in scripts that need to call Portage repeatedly, but portage-utils does not offer the same functionalities.

Well hot-damn, I could have told you that from the beginning, no? It’s times like this that I sing the praises of C and mock all those Python people. Then I try to write a difficult program and cry myself to sleep.

Anyway, language wars and tearful nights aside, there are also a couple of other distinguishing things about q. First, its simple format can make parsing a bit simpler. Then again, it could make it harder, so let’s not go there. As a matter of fact, equery makes a point of modifying its output if you redirect its output. If you don’t like the modified style of output, you can pass the -N (–no-pipe) flag to turn that behavior off.

Second, q does bring a few unique functions to the table. Namely:

  • atom: split up an atom string (like games-emulation/zsnes-1.51-r2 -> games-emulation zsnes 1.51 r2)
  • cache: search the metadata cache
  • grep: grep in ebuilds
  • lop: emerge log analyzer
  • merge, pkg, tbz2, xpak: all pertain to actually handling various types of packages, which I have no experience with, so I don’t know their usage.

The one that I find particularly is “lop.” In an example from that Gentoo page on portage-utils, try something like:

$ qlop -tH openoffice

It’ll tell you “the merge time” for that package. Now, I’m not sure if that means the last merge or some sort of aggregate. My output tells me:

openoffice: 9 hours, 19 minutes, 11 seconds for 12 merges

I’m guessing that means that it took 9:19:11 to merge 12 packages, in terms of the package in question and its dependencies, but I’m not totally sure on that one. Either way, this is a damn nifty feature. I make jokes all the time about how long some packages take to emerge, and how I can have actual times to back me up! Oh the joys of Gentoo…

As another very useful note, the qsearch command is also substantially faster than “emerge –search.” It’s not nearly as impressive as it is against equery, but it holds its own. An advantage that qsearch has over its equery counterpart, however, is that it has the ability (and has it default) of displaying descriptions of packages. I actually for a long time forgot how to do that on my system, and always ran to gentoo-portage.com.

Last but not least, just like equery, q commands can be shortened as well. Unfortunately, that just means changing something like “q search” to “qsearch.” Not a big improvement, but with a one-letter command, how much can you really ask for?

Why both?

In brief, the snippet above from the Gentoo article on portage-utils gives the answer quite nicely. q may be lacking, but it can be an order of magnitude faster than equery. For those times when you don’t need all that fancy-shmancy formatting and just want to get quick and dirty results, q is your tool.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

linux , , ,