Linux Fu: Databases are top-level file systems


It’s funny how exotic computer technology ends up failing or becoming mainstream. At one time, having more than one user on a computer at a time was high tech, for example. Then there are things that haven’t made a big splash, like vector display or content-addressable memory. The use of mass storage – especially disk drives – in computers, however, has become very widespread. But at one time it was an exotic technique and it was not as simple as it is today.

However, I’m surprised that the file system as we know it hasn’t changed much over the years. Of course, compared to the 1960s, for example, we have much better features. And we have a lot of improvements regarding speed, encoding, encryption, compression, etc. But the fundamental nature of how we store and access files in computer programs is stagnant. But it doesn’t have to be. We know of better ways to organize data, but for some reason most of us don’t use it in our programs. It turns out, however, that it’s pretty straightforward and I’ll show you how, with a toy app, it could be the start of a database for electronic components in my lab.

You can store a database like this in a comma delimited file or by using something like JSON. But I’m going to use a full SQLite database to avoid having a heavy database server and all the pain that goes with it. Will it replace the database behind the airline reservation system? No. But will it work for most of what you’re likely to do? You bet.

Abstraction

If you think about it, the file system is nothing more than an abstraction on the disk drive. Normally we don’t know or care where exactly hello.c is stored. We don’t even care if it’s encrypted or compressed. It can be retrieved over a network or all of its items can be scattered randomly on the disk. We usually don’t care. How about leaving out the file system itself?

It’s pretty much the idea of ​​a database. If I have a list of, say, electronic components, I could store them in a comma delimited file and read it with a spreadsheet. Or I could use a full database. The problem with databases is that they traditionally require server software like MySQL, SQLServer, or Oracle, for example. You can ignore the database interface, but it’s a pretty cumbersome solution compared to just opening a file and using it normally.

However, there is a frequently used library called SQLite which provides a fairly robust database that can live in a single file without an external server or maintenance. There are of course limitations, but for many simple programs it can provide the benefits of a database without the overhead and expense.

The right tool for the right job

Of course, there are limits. However, if you are using your own file format for something, you might want to consider switching to SQLite and managing it like a database. According to the project website, this could in fact save space and increase access speed. Plus, once you get the hang of it, it’s easier. It’s also easier to scale up later if you decide to switch to a real database.

If you are storing huge databases (like terabyte scale) or need a lot of concurrent users, especially to write to the database, this might not be for you. The SQLite site has a good page on which uses are good and which are not optimal for the library.

Another plus: there’s a command line program (and a few GUI variations like the browser in the accompanying image) that lets you work with SQLite databases without writing any code. So you can do things like populate your data or examine your database without having to write SQL at all. For a custom file format, you’ll probably have to do everything yourself or populate and debug the data with a generic tool that doesn’t know your specific data.

My task

I don’t want to develop an entire application in one article, nor teach SQL – the structured query language that most databases include using SQLite. But I want to show you how easy it is to start a simple electronic database using C. The C code will turn out to be the least of our problems. The two things you’ll want to understand the most are how to structure the data – the database schema – and how to populate the initial data. While you might want your program to add data eventually, it’s best to start with a bit of data at the start for your program to work.

Database Basics

A modern relational database has one or more tables. Each table contains rows of data. A row has one or more columns and each column has a data type. For example, you can have a text column for the serial number, an actual numeric value for the test point voltage, and a Boolean for pass / fail.

Each table has a unique identifier per row. The database will provide you with one if you don’t, but generally you will want to provide this unique identifier yourself. The database will help you by automatically incrementing the number and making sure it is unique for each row.

If that was all there was to do, there wouldn’t be much benefit over a comma delimited file. But there are a lot of things that we can do better once we have this organizational structure. For example, it is easy to ask the database to sort the elements or select the three highest voltages from the array.

However, one of the main advantages of a database is the ability to perform joins. Suppose I have a list of components: a PC board, resistor, battery holder, and an LED. I have an array which has a row corresponding to each of them. Now suppose I want to have an assembly table made up of components.

I could take a simple approach:

Table Component
ID    Name
===========
1     PCB
2     Resistor
3     LED
4     Battery Holder

Table Assembly
ID    Name       Components
============================
1     Blink1     PCB, Resistor, LED, Battery Holder
2     Blink2     PCB, Resistor, LED, Resistor, LED, Battery Holder



That's ugly and wasteful. A better approach would be to use three tables:

Table Component
ID Name
===========
1 PCB
2 Resistor
3 LED
4 Battery Holder

Table Assembly
ID Name 
=========
1 Blink1 
2 Blink2 

Table Assembly_Parts
ID    Component    Quan
=======================
1     1            1
1     2            1
1     3            1
1     4            1
2     1            1
2     2            2
2     3            2
2     4            1

Using a join operation, you can link these tables together to generate what is equivalent to the first table without duplicating a lot of data.

So for my toy database, I’m going to create three tables: part will contain the parts I have. the partnums table will contain part types (for example a 7805 vs a 2N2222 or a CDP1802. Finally, a locations table will tell me where I put things. There are other ways to structure this. For example, there might be a table to store types of footprints: a 2N2222 might be in a TO92 or a surface mount. In addition, I will create a view that shows everything unfolded like in the first example. A view is something that is not stored but acts like a table for convenience. In reality, it is just a query on the database that you can work with.

There is much more, of course. There are inner and outer joins and a lot of other details and nuances. Fortunately, there is a lot of database material on the web to read, including SQLite documentation.

Just enough SQL

For our purposes, we’re only going to use a handful of SQL statements: create, insert, and select. There is an executable, sqlite3, where you can enter database commands. You can provide the database name on the command line and this is the easiest way to do it. Use .exit when you want to go out.

You can probably understand the SQL syntax because it is quite detailed:

create table part ( id integer not null primary key, name text, partnum integer, value text, 
   units text, quantity integer, photo blob, data text, location integer, footprint text);
create table partnums (id integer not null primary key, partnum text, desc text);

create table locations (id integer not null primary key, location text, desc text);

create view full as select part.id, name, partnums.partnum as part_number, value, units, 
   quantity, data, locations.location as location, footprint from part 
   inner join partnums on part.partnum = partnums.id inner join locations on locations.id=part.location

I just made these calls in the sqlite3 command line program although I could have used the GUI or – if I wanted – could have my C program run these commands. I also used the command line to insert some test records. For example:

insert into locations (location,desc) values ("Shop - storage II","Storage over computer desk in shop");
insert into partnums(partnum,desc) values("R.25W","Quarter Watt Resistor");
insert into part(partnum,quantity,location,value,units) values (2,111,1,"10K","ohms");

To retrieve the data, you will use the select command:

select * from part;

select partnum, quantity from part where quantity<5;

If you want to know more, there are plenty of SQL tutorials on the Web.

Programming!

So far, none of this has required programming. Assuming you have the libsqlite3-dev package or its equivalent, you don’t need much to add database functions to your C program. You will need to include sqlite3.h. If you can’t find it, you probably haven’t installed the development files. You will also need to establish a link with libsqlite3. For a simple single file project, this makefile will probably help you get started:

CC=gcc
CFLAGS+=-std=c99 -g
LDFLAGS=-g
LDLIBS+=-lsqlite3

edatabase : main

main : main.c

The code itself is straightforward. You need to open the database file (sqllite3_open). Instead of a file, you can pass “: memory” to get an in-memory database that will not last beyond the life of your program. The call will give you a handle back to your database. Next, you need to analyze or prepare the SQL statement that you want to execute. It could be any SQL that we executed through the interface or many other SQL statements. In my case, I want to extract the data from the full view and display it, so I’m going to analyze:

select * from full;

Finally you will call sqlite3_step and while he comes back SQLITE_ROW, you can handle the line using calls such as sqlite3_column_text. At the end, you finalize the database and close it. Here is the code with the error handling removed:

#include 
#include 

int main(int argc, char *argv[])
   {
   sqlite3 *db;
   sqlite3_stmt *sql;
   int rv;

   rv=sqlite3_open("parts.db",&db);
   rv=sqlite3_prepare_v2(db, "SELECT * from full", -1, &sql, NULL);
   do
     {
     rv=sqlite3_step(sql);
     if (rv==SQLITE_ROW)
        {
        printf("%s,",sqlite3_column_text(sql,0));
        printf("%sn",sqlite3_column_text(sql,2));
        }
     } while (rv==SQLITE_ROW); 
   sqlite3_finalize(sql);
   sqlite3_close(db);
   return 0;
}

Or, take a look to the full code. In a case where you didn’t care to go through the rows, you could have called sqlite3_exec. Even the documentation admits that this is just a wrap around preparation, step, and finalization so you can just skip a chain and expect it to work.

Of course there is many more calls. For example, you can call sqlite_column_int or other calls to get particular types. You can bind parameters to SQL calls to set values ​​instead of creating a string. But it shows you how easy it can be to make a simple SQLite program.

So the next time you find yourself inventing a new file format, consider using SQLite instead. You will get free tools and once you learn SQL you will find that there is a lot you can do without writing actual code other than various SQL commands. You can even use Git-like branches to maintain versions of your database. Again, some people use git as their database, but we don’t suggest it.


About Jon Moses

Check Also

Check department, course name and other details here

NSIC Recruitment 2022: Check Department, Course Name and Other Details Here 2022 NSIC recruitment: National …

Leave a Reply

Your email address will not be published.