Thursday, August 24, 2006

Odd "Array" output in Joomla/PHP module solved

I had an odd problem in which a Joomla! module I was writing was outputting "Array" after its output.

I didn't find any code in the module itself that was capable of outputting this error message, since this came after the last line of the module's output echo() statements. So, I was puzzled!

Then I realized that it might have been because I had overwritten a global variable. I had liberally used variables like $id, $content, $categories, etc. So I replaced all my variable names with ones that couldn't possible belong to the reserved variable list ... such as $category__id, etc.

And that solved the problem! :)

Friday, August 04, 2006

C libraries to capture layer two packets

For the 7DS system, we need some libraries for peeking at the link layer packets. These are the ones we found:


Monday, July 31, 2006

Automake/autoconf for 7DS

I finally started workng with the GNU Automake/Autoconf system for 7DS today. It's really neat! Far easier to work with than the faulty manual Makefiles I've been creating so far. I hope to have a fully GNU-style 7DS system (at least for my part) by the end of this week.

I found that the deadline for submitting my paper to INFOCOM has passed, so I will probably be submitting a paper to ICC07.

Wednesday, July 26, 2006

Verbose options and logging levels/priority

Today I did something really cool with the 7DS logging system that enabled me to turn the "verbose" option off and instead introduce something more advanced: a level system that outputs log messages to the log file or screen based on the user setting.

To set this up, I basically made added two functions like this, one for on-screen and one for log files:

int SdsLogSetPrintLevel (int level)
print_level = level;

and then changed the code so that it prints only the higher priority messages to log file or screen, depending on user settings.

/* If level is greater than specified ,
* then print to stderr. */
if (level >= print_level)
fprintf(stderr, logstatement);

/* If level is greater than specified log level, write to logfile */
if (level >= log_level)
fprintf(logfp, "%s", logstatement);

Of course, I had to set the default values appropriately, so that the log file would get all messages.

int log_level = SDS_LOG_DEBUG, /* File should have debug messages */
print_level = SDS_LOG_INFO; /* Screen should have info messages */

Tuesday, July 25, 2006

7DS code cleaning

I' m continuing to clean up my 7DS code. I have to follow my advisor's instructions on formatting code, which are very helpful.

I'm also finding that other than configuration files and logging, I will well have to start using getopt() for command line options. Well, better late than never...

Tuesday, July 18, 2006

7DS software engineering and wikis

Sorry I haven't been active for a while on this blog ... well, a few days ago, I realized I was running into a brick wall with the 7DS project, not sure of what to do, and faced with an overwhelming array of choices that I didn't know how to handle. :(

Luckily, I e-mailed Henning and met him today (with Se Gi and Andy) to get an idea of how to come to grips with the situation and proceed further with the project.

It looks like:
  1. We will release a version that has the basic features, but working.
    I think this can be released as source code with configure/make/make install, as well as installation binaries.
  2. In parallel, we will draft guidelines for future versions and see how to make 7DS more modular so that it is "future-compatible". Things like the community extensions and file syncronization will be part of the next release.
In addition, Henning suggested that we use a wiki to document our work. That's a great idea! The only thing now is to find a place to host it ... unless we have a CS-wide one.

One great wiki I've heard a lot about is DokuWiki...

Wednesday, June 28, 2006

Installing Eclipse with PHP and C

I decided to install Eclipse today to use as my primary development kit. I've used vi, gEdit, jExt (I really like jExt!) but life might be simpler with an IDE.

So I will try Eclipse with the C development and maybe PHP development environements and see how it works out.

Tuesday, June 27, 2006

Installed DarwinPorts, sqlite, swish-e, and all other required libraries

OK - I finally managed to get DarwinPorts installed on AirTrain (the iMac). I couldn't get it installed under OS X 10.2 - it complained that TCL was built without threading enabled.

Now that I have OS X 10.4 and the Xcode tools, DarwinPorts installed without a hitch from source.

I am now able to successfully build all the other required libraries that I need to get 7DS running on the Mac, including:
- sqlite3
- swish-e
- libconfuse

(and maybe some more? I forget.)

Upgraded AirTrain iMac to Mac OS 10.4

I upgraded AirTrain (the IRT lab iMac) from 10.2 to Mac OS X 10.4. Surprisingly, the install DVD had an upgrade option - so I didn't have to format the disk at all! So much for my warnings to my colleagues to backup all the stuff they needed...

Thursday, June 08, 2006

Porting 7DS from Unix to Mac OS and Windows

Right now, I'm involved in porting the 7DS system from the Linux system to run on Windows and the Mac OS X. It is really fun (!) though it might take a while to complete...

For Windows, I am currently trying Cygwin. I heard that it uses a compatibility layer and a cygwin.dll file though - meaning that Cygwin will have to be installed. I have heard about MinGW and will see if that avoids this issue.

For the Mac OS X, I first heard that we might need to install the XCode development tools, but now it looks like it is not necesssary. The Mac OS X 10.2 system I am working with already has the GCC environment, and except for dynamic library handling, it seems to compile most apps. Dynamic library loading is a problem though: I even installed dlcompat from Fink, but it still has problems! Fink, btw, is a build environment for Darwin that promises to make it easier for people to port open source apps to Mac OS.

Wednesday, April 19, 2006

Classification of XML data

As part of the 7DS community extension, I will have to classify shared community XML objects using some sort of XML schema, RDF or RDF Schemas. We looked at OWL today, but that looks fairly complicated, so we may just go ahead and use RDF and RDF schemas.

Friday, April 14, 2006

PHP and web (http) servers on Windows Mobile / PocketPC

As the scope of the 7DS project expands, I'm coming up against building more involved 7DS web applications. The search and multicast engines were C programs that produced binary CGI executables, but building more involved community-based web systems is going to be hard. :(

In looking to build the new applications in PHP, I was asked to see if PHP might be supported on Windows Mobile (one of our near-future development platforms) and I found it interesting that Windows Mobile SDK has its own HTTP/web server.

Even more interesting, there have been problems porting the Zend PHP engine to Windows Mobile, so there is an alpha version of a ground-up PHP engine built for Windows Mobile with its own web server.

Friday, March 31, 2006

Always use sizeof() when malloc()ing

I've had some real big problems implementing my webpage retreiver, and I fixed it once I realized that I had to include the sizeof() everytime I malloc()d.

So remember to use this everytime you malloc, say a string:

newstring = malloc ((strlen(oldstring)+1) * sizeof(char));

Remember to add 1 as well, just as I did: C strings terminate with a '\0', which is an extra character.

Thursday, March 30, 2006

Memory leak for null string assignment

This is really wierd... the following piece of code in my 7DS system results in a huge memory leak, gobbling up memory really fast.

if (0 == hits) {
// No results, empty xmlResults and return
sprintf (xmlResult, "");
return 0;

Disabling it solves the memory problem - I wonder why?

Tuesday, March 28, 2006

malloc() and free() error for dynamic strings solved

OK, I am a newbie at dynamic strings in C, so please forgive my silliness.

I have been getting "*** glibc detected *** free(): invalid next size (fast)" errors in my application that has to create dynamic path names and couldn't figure it out.

Finally I did a Google search and found the solution here:

Guess what I had done? malloc()d the string to use one less character than needed like this:
escapedURL = malloc (strlen (URL));

This is CORRECT:
escapedURL = malloc (strlen (URL) + 1);

Because C strings end with a '\0' character.

Monday, March 27, 2006

Parsing filename and directories out of given path in C

Sample code for how to parse directory structure and path names for a given path string in C. I will use this for the webpage retreiver project I am working on.

#include <string.h>
#include <stddef.h>
#include <libgen.h>
#include <stdio.h>

/* This program parses the path given in the argument into directories
* and filename, creates the directory structure and creates an
* empty file as well */
main (int argc, char **argv)
const char delimiters[] = "/\\"; /* File delimiters */
char *token, *oldtoken, *cp;
const char program_directory = getcwd (NULL, 0); /* Program directory */

/* Create copy of path string */
cp = malloc (strlen (argv[1]));
strcpy (cp, argv[1]);

/* Split string into tokens */
token = strtok (cp, delimiters);

printf ("Directory = ");

/* While token is not NULL */
while (1)
oldtoken = malloc (strlen (token));
strcpy (oldtoken, token); /* Copy current token */
token = strtok (NULL, delimiters); /* Go to the next token */
/* If nexxt token is NULL, it is the last part and assumed to
* be a filename */
if (token == NULL)
printf ("\nFilename = %s\n", oldtoken);
/* Create an empty file of that name */
FILE *fp;
fp = fopen (oldtoken, "w");
fclose (fp);
/* Otherwise it is a directory */
printf ("%s ", oldtoken);
mkdir (oldtoken, 0755); /* Create the directory */
chdir (oldtoken); /* Go there */
free (oldtoken);
printf ("\n");
/* Free any memory elements */
free (cp);
oldtoken = NULL;

chdir (program_directory);

Friday, March 17, 2006

URL or URI parsing in libxml and c

I found out that libxml has URI functions that will allow you to parse URIs using libxml.

Here's an example program:

#include <stdio.h>
#include <libxml.h>

int main(int argc, char **argv) {

/* Create a null URI */
xmlURIPtr url = xmlCreateURI ();

/* Parse the user input URI */
url = xmlParseURI ( (argc <= 1) ? "" : argv[1]);

/* Print all the respective information */
printf ("scheme = %s\n", url->scheme);
printf ("opaque = %s\n", url->opaque);
printf ("authority = %s\n", url->authority);
printf ("server = %s\n", url->server);
printf ("user = %s\n", url->user);
printf ("port = %d\n", url->port);
printf ("path = %s\n", url->path);
printf ("query = %s\n", url->query);
printf ("fragment = %s\n", url->fragment);
printf ("cleanup = %d\n", url->cleanup);

Parsing HTML using tidy and tidylib

It's so hard to find a C program on the web that can parse HTML! Yes, you can find parsers written in Perl and other languages, but not C!

So I might as well share what I've learnt so far. I am making the 7DS HTML parser in libxml, but I experimented using tidy and tidylib as well, and here's how the code for that looks:

#include <tidy.h&rt;
#include <buffio.h&rt;
#include <stdio.h&rt;
#include <errno.h&rt;

* Dump the list of nodes and their attributes
* Modified from tidylib documentation
void dumpNode( TidyNode tnod, int indent )
TidyNode child;

for ( child = tidyGetChild(tnod); child; child = tidyGetNext(child) )
ctmbstr name = tidyNodeGetName( child );
if ( !name )
switch ( tidyNodeGetType(child) )
case TidyNode_Root: name = "Root"; break;
case TidyNode_DocType: name = "DOCTYPE"; break;
case TidyNode_Comment: name = "Comment"; break;
case TidyNode_ProcIns: name = "Processing Instruction"; break;
case TidyNode_Text: name = "Text"; break;
case TidyNode_CDATA: name = "CDATA"; break;
case TidyNode_Section: name = "XML Section"; break;
case TidyNode_Asp: name = "ASP"; break;
case TidyNode_Jste: name = "JSTE"; break;
case TidyNode_Php: name = "PHP"; break;
case TidyNode_XmlDecl: name = "XML Declaration"; break;

case TidyNode_Start:
case TidyNode_End:
case TidyNode_StartEnd:
assert( name != NULL ); // Shouldn't get here
assert( name != NULL );
char whitespace[indent];
memset (whitespace, ' ', indent);
whitespace[indent-1] = '\0';
// printf( "%sNode: %s\n", whitespace, name );

/* Get the first attribute for all nodes */
TidyAttr tattr = tidyAttrFirst (child);
while (tattr != NULL) {
/* Print the node and its attribute */
printf ("%s %s %s= %s\n", whitespace, tidyNodeGetName (child), tidyAttrName (tattr), tidyAttrValue (tattr));
/* Get the next attribute */
tattr = tidyAttrNext (tattr);
dumpNode( child, indent + 4 );

/* Dump the whole document */
void dumpDoc( TidyDoc tdoc )
dumpNode( tidyGetRoot(tdoc), 0 );

/* Dump only the body */
void dumpBody( TidyDoc tdoc )
dumpNode( tidyGetBody(tdoc), 0 );

int main(int argc, char **argv )
/* Input file: Either the first argument or "../test.html" */
const char* input = (argc > 1) ? argv[1] : "../test.html";
TidyBuffer output = {0};
TidyBuffer errbuf = {0};
int rc = -1;
Bool ok;

TidyDoc tdoc = tidyCreate(); // Initialize "document"
printf( "Tidying:\t%s\n", input );

ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); // Convert to XHTML
if ( ok )
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
if ( rc >= 0 )
/* Read from the HTML file */
rc = tidyParseFile( tdoc, input ); // Parse the input
if ( rc >= 0 )
rc = tidyCleanAndRepair( tdoc ); // Tidy it up!
if ( rc >= 0 )
rc = tidyRunDiagnostics( tdoc ); // Kvetch
if ( rc > 1 ) // If error, force output.
rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
if ( rc >= 0 )
rc = tidySaveBuffer( tdoc, &output ); // Pretty Print

if ( rc >= 0 )
if ( rc > 0 )
printf( "\nDiagnostics:\n\n%s", errbuf.bp );
printf( "\nAnd here is the result:\n\n%s", output.bp );
printf( "A severe error (%d) occurred.\\n", rc );

tidyBufFree( &output );
tidyBufFree( &errbuf );

/* Now parse and print the tags in the HTML document */
dumpDoc (tdoc);

tidyRelease( tdoc );
return rc;

Tuesday, March 14, 2006

Webcrawler using libxml, libcurl and tidy

Contrary to my writeup in the last post about how wget might be the best way to webcrawl and fetch files to a local cache, my thoughts now are different.

You can use the following libraries to build a decent webcrawler:

1. Tidy: Use tidylib to clean up your HTML pages and make them XHTML. tidylib's webpage has sample code that is good enough for converting HTML to XHTML - just make sure you save to a file using tidySaveFile().

libxml has problems parsing HTML, even if used with xmlRecoverFile() rather than xmlParseFile().

2. libxml: Parse the XHTML, get all elements' attributes (and any other URLs you need) and pass on the URLs to libcurl to download. Need I say more?

Well, actually I should. libxml is a little hard to understand from the API, and sample code to do what you want is hard to find. I had to do quite a bit of searching, looking up sample programs, and then reading the API to figure out how things worked.

3. curl: Or rather libcurl. To retrieve files from the Net. Again, need I say more?

Life would have been simpler if curl had a recursive download function ... or wget had a library I could use ... but then, that's why we computer engineers and students have a life!

Tuesday, February 28, 2006

wget to create local cache of webpage

Here's how to use wget to create a local cache of a webpage:
wget -r -l 1 –p –-convert-links

For more information:

Tuesday, February 14, 2006

Very nice and detailed documentation on creating RPMs:

It includes information not present in the official RPM FAQ at

Monday, February 06, 2006

Processes in C: /dev/urandom C code, forking several child processes...


Sorry, I've been busy with courses AND research lately, it 's given me less time to post on the blog.

For an OS course, we have to do the Miller-Rabin test for primality, and it involves forking atleast 3 child processes; getting random numbers from /dev/urandom, etc.

Here is some code for having 2 or more child processes:

For reading from /dev/urandom (and why that's a good idea):