I’m building a system that a system that adds spatial query capabilities on top of RDF-3X, which is a storage and query engine for RDF data. I want to write a layer on top of RDF-3x that takes RDF query involving spatial relationships, rewrite the query, and submit the rewrite queries to the underlying RDF-3x query interface. The design is that RDF-3x query interface will run as a process, and my layer will run as another process that calls the query interface as needed.

I need some mechanism for communication between my process and RDF-3x query interface process. RDF-3x takes a query from the stdin and output query result to stdout. My layer needs to redirect both the stdin and stdout of RDF-3x to itself to submit queries and get results back for processing.

I decided to go with using two pipes for the inter-process communication: one for writing query to the child process (RDF-3x), and the other one for reading result back. I spent about two days figuring out how to do it, and here is my stackoverflow post: http://stackoverflow.com/questions/9318125/two-way-parent-child-communication-using-2-pipes-in-c-on-linux. The point to remember when using a pipe is:

  • pipe[0] is the reading end - the end of the pipe data comes out
  • pipe[1] is the writing end - the end of the pipe data goes in

The other issue is about non-blocking read of results from RDF-3x. The issues extends from after the query interface returns all the results, it doesn’t close the output file descriptor. Since the pipe will always be open, my process doesn’t know when the child process returned all the results, and reads in my process will be blocked waiting for more data. I tried to find a way to perform a non-blocking read so that my process simply ends when it sees no more data on the stream, but keep reading when there is more data.

Spent sometime to figure this out and I believe it can definitely be done, but after sometime thinking, blocked read isn’t such a bad thing. The reason is that RDF-3x sometimes take a while to return any result. If I had nonblocking read, my process might ended before the results are returned thinking there are no more result.

Instead of using non-blocking read, I modified RDF-3x query interface (rdf3xquery) source code so that the last query result is always followed by a '\n' so that when my process sees a new line character, it knows that there are no more result to read.