blog dds

2016.03.18

Verifying the Substitution Cipher Folklore

A substitution cipher has each letter substituted with another. Cryptography folklore has it that simple substitution ciphers are trivial to break by looking at the letter frequencies of the encrypted text. I tested the folklore and the results were not quite what I was expecting.

Continue reading "Verifying the Substitution Cipher Folklore"

2013.12.11

The Birth of Standard Error

Earlier today Stephen Johnson, in a mailing list run by the The Unix Heritage Society, described the birth of the standard error concept: the idea that a program's error output is sent on a channel different from that of its normal output. Over the past forty years, all major operating systems and language libraries have embraced this concept.

Continue reading "The Birth of Standard Error"

2012.09.22

How to Calculate an Operation's Memory Consumption

How can you determine how much memory is consumed by a specific operation of a Unix program? Valgrind's Massif subsystem could help you in this regard, but it can be difficult to isolate a specific operation from Massif's output. Here is another, simpler way.

Continue reading "How to Calculate an Operation's Memory Consumption"

2011.12.28

Pretend Invitations

Choosing between people you want to invite to a function and people you have to invite is sometimes difficult. Say Alice wants to invite Tom, Dick, and Harry to a party, but she'd actually prefer if Dick didn't show up. Here's how Alice can send invitations by email from an email-capable Unix system to achieve the desired result, while covering her scheming with plausible deniability.

Continue reading "Pretend Invitations"

2011.12.14

Apps are the New Users

Some facilities provided by mature multi-user operating systems appear arcane today. Administrators of computers running Mac OS X or Linux can see users logged-in from remote terminals, they can specify limits on the disk space one can use, and they can run accounting statistics to see how much CPU time or disk I/O a user has consumed over a month. These operating systems also offer facilities to group users together, to specify various protection levels for each user's files, and to prescribe which commands a user can run.

Continue reading "Apps are the New Users"

2011.05.21

Code Verification Scripts

Which of my classes contain instance variables? Which classes call the method userGet, but don't call the method userRegister? These and similar questions often come up when you want to verify that your code is free from some errors. For example, instance variable can be a problem in servlet classes. Or you may have found a bug related to the userGet and userRegister methods, and you want to look for other places where this occurs. Your IDE is unlikely to answer such questions, and this is where a few lines in the Unix shell can save you hours of frustration.

Continue reading "Code Verification Scripts"

2010.08.04

Batch Files as Shell Scripts Revisited

Four years ago I wrote about a method that could be used to have the Unix Bourne shell interpret Windows batch files. I'm using this trick a lot, because programming using the Windows/DOS batch files facilities is decidedly painful, whereas the Bourne shell remains a classy programming environment. There are still many cases where the style of Unix shell programming outshines and outperforms even modern scripting languages.

Continue reading "Batch Files as Shell Scripts Revisited"

2010.01.12

Useful Polyglot Code

Four years ago I blogged about an incantation that would allow the Windows command interpreter (cmd) to execute Unix shell scripts written inside plain batch files. Time for an update.

Continue reading "Useful Polyglot Code"

2009.10.15

Tags for Bibliography References

I love writing my papers in LaTeX. Its declarative style allows me to concentrate on the content, rather than the form. I even format the text according to the content, keeping each phrase or logical unit on a separate line. Many publishers supply style files that format the article according to the journal's specifications. Even better, over the years I've created an extensive collection of bibliographies. I can therefore use BibTeX to cite works with a simple command, without having to re-enter their details. This also allows me to use style files to format references according to the publisher's specification. Yet, there is still the problem of navigating from a citation to the work's details. Here is how I solve it.

Continue reading "Tags for Bibliography References"

2009.09.16

Applied Code Reading: Debugging FreeBSD Regex

When the code we're trying to read is inscrutable, inserting print statements and running various test cases can be two invaluable tools. Earlier today I fixed a tricky problem in the FreeBSD regular expression library. The code, originally written by Henry Spencer in the early 1990s, is by far the most complex I've ever encountered. It implements sophisticated algorithms with minimal commenting. Also, to avoid code repetition and increase efficiency, the 1200 line long main part of the regular expression execution engine is included in the compiled C code three times after modifying various macros to adjust the code's behavior: the first time the code targets small expressions and operates with bit masks on long integers, the second time the code handles larger expressions by storing its data in arrays, and the third time the code is also adjusted to handle multibyte characters. Here is how I used test data and print statements to locate and fix the problem.

Continue reading "Applied Code Reading: Debugging FreeBSD Regex"

2009.08.05

How to Create a Self-Referential Tweet

Yesterday Mark Reid posted on Twitter a challenge: create a self-referential tweet (one that links to itself). He later clarified that the tweet should contain in its text its own identifier (the number after "/status/" bit should be its own URL). I decided to take up the challenge ("in order to learn a bit about the Twitter API" was my excuse), and a few hours later I won the game by posting the first self-referential tweet. Here is how I did it.

Continue reading "How to Create a Self-Referential Tweet"

2009.05.07

Fixing the Orientation of JPEG Photographs

I used to fix the orientation of my photographs through an application that would transpose the compressed JPEG blocks. This had the advantage of avoiding the image degradation of a decompression and a subsequent compression.

Continue reading "Fixing the Orientation of JPEG Photographs"

2009.03.04

Parallelizing Jobs with xargs

With multi-core processors sitting idle most of the time and workloads always increasing, it's important to have easy ways to make the CPUs earn their money's worth. My colleague Georgios Gousios told me today how the Unix xargs command can help in this regard.

Continue reading "Parallelizing Jobs with xargs"

2009.01.25

A Well-Tempered Pipeline

I am studying the use of open source software in industry. One way to obtain empirical data is to look at the operating systems and browsers used by the Fortune 1000 companies by examining browser logs. I obtained a list of the Fortune 1000 domains and wrote a pipeline to summarize results by going through this site's access logs.

Continue reading "A Well-Tempered Pipeline"

2008.10.27

Monitor Process Progress on Unix

I often run file-processing commands that take many hours to finish, and I therefore need a way to monitor their progress. The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple of years back in 1993 (don't ask) had a facility that displayed for any executing command the percentage of work that was completed. When I first saw this facility working on the programs I maintained, I couldn't believe my eyes, because I was sure that those rusty Cobol programs didn't contain any functionality to monitor their progress.

Continue reading "Monitor Process Progress on Unix"

2008.09.11

Unzipping Files in Order

Over the past couple of years I've enjoyed listening to the audio edition of the Economist newspaper. The material is superb (although I occasionally get the feeling of listening to the Voice of America), the articles are read in a clear voice, the data's encoding is plain MP3, unencumbered by digital rights (restrictions) management silliness, and the audio format is convenient to listen on the metro or while jogging. Unfortunately, the articles in the audio edition's zip file are haphazardly ordered, which, until today, marred the enjoyment of my listening.

Continue reading "Unzipping Files in Order"

2008.08.05

A Child's Crontab

When the time to go to sleep is approaching, all children seem to be configured with the same crontab.

Continue reading "A Child's Crontab"

2008.04.20

Assigning Responsibility

Over the past few days I worked over a large code body correcting various accumulated errors and style digressions. When I finished I wanted to see who wrote the original lines. (It turned out I was not entirely innocent.)

Continue reading "Assigning Responsibility"

2007.08.28

The Treacherous Power of Extended Regular Expressions

I wanted to filter out lines containing the word "line" or a double quote from a 1GB file. This can be easily specified as an extended regular expression, but it turns out that I got more than I bargained for.

Continue reading "The Treacherous Power of Extended Regular Expressions"

2007.04.16

Breaking into a Virtual Machine

Say you're running your business on a rented virtual private server. How secure is your setup? I wouldn't expect it to be more secure than the system your server runs on, and a simple experiment confirmed it.

Continue reading "Breaking into a Virtual Machine"

2007.03.15

Make vs Ant: Observability

I've long felt uncomfortable with ant as a build management tool. I thought that my uneasiness stemmed from the verbose XML used for describing tasks, and the lack of default dependency resolution. Today, email from a UMLGraph user struggling with a complex ant task made me realize another problem: lack of observability.

Continue reading "Make vs Ant: Observability"

2006.12.15

Cracking Software Reuse

[Newton] said, "If I have seen further than others, it is because I've stood on the shoulders of giants." These days we stand on each other's feet!

— Richard Hamming

Sometimes we encounter ideas that inspire us for life. For me, this was a Unix command pipeline I came across in the '80s:

Continue reading "Cracking Software Reuse"

2006.06.16

Batch Files as Shell Scripts

Although the Unix Bourne shell offers a superb environment for combining existing commands into sophisticated programs, using a Unix shell as an interactive command environment under Windows can be painful.

Continue reading "Batch Files as Shell Scripts"

2006.04.03

Efficiency Will Always Matter

Many claim that today's fast CPUs and large memory capacities make time-proven technologies that efficiently harness a computer's power irrelevant. I beg to differ, and my experience in the last three days demonstrated that technologies that originated in the 70s still have their place today.

Continue reading "Efficiency Will Always Matter"

2005.12.05

A Clash of Two Cultures

I dug the following gem from the Usenix HotOS X Conference Panel titled "Do we work within existing frameworks or start from scratch?", summarized by Prashanth Bungale.

Continue reading "A Clash of Two Cultures"

2005.11.01

Working with Unix Tools

A successful [software] tool is one that was used to do something undreamed of by its author.

— Stephen C. Johnson

Line-oriented textual data streams are the lowest useful common denominator for a lot of data that passes through our hands. Such streams can be used to represent program source code, web server log data, version control history, file lists, symbol tables, archive contents, error messages, profiling data, and so on. For many routine, everyday tasks, we might be tempted to process the data using a Swiss army knife scripting language, like Perl, Python, or Ruby. However, to do that we often need to write a small, self-contained program and save it into a file. By that point we've lost interest in the task, and end-up doing the work manually, if at all. Often, a more effective approach is to combine programs of the Unix toolchest into a short and sweet pipeline that we can run from our shell's command prompt. With the modern shell command-line editing facilities we can build our command bit by bit, until it molds into exactly the form that suits us. Nowadays, the original Unix tools are available on many different systems, like GNU/Linux, Mac OS X, and Microsoft Windows, so there's no reason why you shouldn't add this approach to your arsenal.

Continue reading "Working with Unix Tools"

2005.07.01

Tool Writing: A Forgotten Art?

Merely adding features does not make it easier for users to do things—it just makes the manual thicker. The right solution in the right place is always more effective than haphazard hacking.

— Brian W. Kernighan and Rob Pike

In 1994 Chidamber and Kemerer defined a set of six simple metrics for object-oriented programs. Although the number of object-oriented metrics swelled to above 300 in the years that followed, I had a case where I preferred to use the original classic metric set for clarity, consistency, and simplicity. Surprisingly, none of the six open-source tools I found and tried to use fitted the bill. Most tools calculated only a subset of the six metrics, some required tweaking to make them compile, others had very specific dependencies on other projects (for example Eclipse), while others were horrendously inefficient. Although none of the tools I surveyed managed to calculate correctly the six classic Chidamber and Kemerer metrics in a straightforward way, most of them included numerous bells and whistles, such as graphical interfaces, XML output, and bindings to tools like ant and Eclipse.

Continue reading "Tool Writing: A Forgotten Art?"

2005.04.13

A Pipe Namespace in the Portal Filesystem

The portal filesystem allows a daemon running as a userland program to pass descriptors to processes that open files belonging to its namespace. It has been part of the *BSD operating systems since 4.4 BSD. I recently added a pipe namespace to its FreeBSD implementation. This allows us to perform scatter gather operations without using temporary files, create non-linear pipelines, and implement file views using symbolic links.

Continue reading "A Pipe Namespace in the Portal Filesystem"

2005.02.18

XML Versus Text Files

The JDepend package dependency analyzer can output its results either as XML or as plain text. Instead of using the XML output, I found myself processing the text output using awk. Am I becoming tied to old-world thinking, or are text files easier to process?

Continue reading "XML Versus Text Files"

2004.09.11

System administration stories: The Revolt

Can a small embedded system the size of a paperback lead a group of machines into revolt? Apparently yes.

Continue reading "System administration stories: The Revolt"

2003.10.25

A Unix-based Logic Analyzer

A circuit I was designing was behaving in unexpected ways: the output of a wireless serial receiver based on Infineon's TDA5200 was refusing to drive an LS TTL load. To debug the problem I needed an oscilloscope or a logic analyzer, but I had none. I searched the web and located software to convert the PC's parallel port to a logic analyzer. I downloaded the 900K program, but that was not the end. Unfortunately the design of Windows 2000 does not allow direct access to the I/O ports, so I also downloaded a parallel port device driver and a program to give the appropriate privileges to other programs. Finally, I also downloaded from a third site the Borland runtime libraries required by the logic analyzer. Needless to say that the combination refused to work.

Continue reading "A Unix-based Logic Analyzer"


Creative Commons License Last update: Thursday, September 22, 2016 9:56 am
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.