Flipper-IRDB to SQLite3

I played a little with the ‘Infrared’ of the Flipper Zero. For longer my plan was to put all .ir files into a searchable database to perform more analysis.

Now I’ve done it - at least the first step. There is a lot of space for improvement, but today I was able to made my first queries.

The Project: FlipperMgmt/irfiles_import.py at main · LupusE/FlipperMgmt · GitHub

Because I’ve organized all my git clones in one subfolder, the expected path is …/Flipper-IRDB/ and the result will be written in …/.

You can open the database with any SQLite (or SQLite-able) Client.

If you’d like to know what are the most used ‘Power’ address/command pairs:

SELECT name,address,command, COUNT(name)
FROM irbutton
WHERE name like 'Power' AND Type like 'parsed'
GROUP BY address,command
ORDER BY COUNT(name) DESC

or if you’d like to just find the TV power commands:

SELECT name,address,command, COUNT(name)
FROM irbutton
	LEFT JOIN irfile ON irbutton.md5hash = irfile.md5hash
	
WHERE name like 'power' AND category = 'TVs' AND Type like 'parsed'
GROUP BY address,command
ORDER BY COUNT(name) DESC

Todo:

  • Write a file exporter
    – Write a Android tool to generate files …
    – … or a web frontend?
  • Write a documentation
  • normalize the data
  • Write a check (based on MD5) to update the db
    – Write an output to MySQL/MariaDB/MS SQL/Postgress/…
  • Cleanup the code

As said before, the files are not very uniform. So one next step is to analyze the ‘irbutton.name’ and write a translation table… but not today.

1 Like

Very nice!

1 Like

Today I added a small documentation and a TV ‘universal remote’ file generator.

For the last one I’ve started a quick ‘button translation table’ inside the SQLite.
There can be issues/mismatches, because it is a annoying manual task and it is really hard to find the right meaning/function of the button, only by Name: fields.

2 Likes

That’s better then trying to use regex to match. It would be very error prone.

1 Like

The last few days I spend on the topic ‘convert RAW to a comparable string’.

My solution is a bit clumsy, right now. But it worked at first sight.
I take a look at the RAW string

name: 8h
type: raw
frequency: 38000
duty_cycle: 0.330000
data: 9692 4496 626 546 626 545 627 545 627 544 628 543 629 543 629 542 630 542 630 1646 627 1651 632 1646 627 1651 632 1646 627 1651 632 1645 628 1649 624 548 624 1653 630 541 631 1646 627 544 628 544 628 543 629 542 630 1646 627 545 627 1650 623 549 623 1654 629 1648 625 1653 630 1647 626 39811 9688 2229 630 96004 9679 2229 630 96005 9676 2228 631

I search for the smallest value, divide every value though it and cast it to INT.
Than I look for repeats.

As result I get something like:

Name SplitValue SplitPart Sequence Repeats Md5Hash
8h 73 [17, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 3, 1, 3, 1, 3, 1] 0 0 a1d2d51cf447cd0ffe0cf62d655f60cb
8h 177 [17, 4, 1] 1 0 a1d2d51cf447cd0ffe0cf62d655f60cb
8h 177 [17, 4, 1] 3 1 a1d2d51cf447cd0ffe0cf62d655f60cb

The signal would be something like Seq*([Rpt*](Value + Part)). I have not developed the reassembly right now, because we have the sources anyway.

2 Likes

I’m guessing your logic is to assume the lowest value is the lowest possible chunk. That’s a good place to start. I’ve come up with a few possible solutions but they all come back to precision and whether to be more or less inclusive. I suspect the final product should be somewhat less inclusive then what you have but by how much? In my alterations of your example the number I assigned to x seems arbitrary but it could be dialed in.

x=10
int(working_number / int(lowest_number/x))

Following that logic though we could set x to an arbitrary integer.

x=50
int(working_number / x)

I think x=15 is the absolute lowest value I would try based your sample data.

Another thought I had is more complex but it would assign a level of confidence to each match. I’m thinking this solution over more so I can better express it.

I wonder if it should look for a mirrored code sequence?

2 Likes

You’re referencing to the NEC protocol we where analyzing a few days back?
Even if the idea is good, but two circumstances locked me by following this:

  1. There is a encoder. If it is a NEC code, the Flipper should be able to encode it. If not, the RAW signal is captured.
    Think a little ahead, maybe some of the files from the _Converted_ folder are still unencoded. My program needs the encoder implemented → TODO
  2. One of my goals is to find pattern and maybe ‘define’ (or better find) new protocols.
    This will be included in further analysis steps. In SQL, not in the python part.

For your formula:
Your math looks good, but I don’t understood the advantage.
My approach was to get a easy plotable string. So 1 as smallest item and than uniformed values.
_-_-_-_- = 11111111
_--__--__- = 122221
_--___---- = 1234
and so on.

In my simple understanding, without any filtering, I take all found divisor, and ask for the average:

SELECT AVG(divisor)
	FROM rawheader

→ 537.824694234129

To optimize my way, I would assume the factor would be anywhere around x=500 (up to 550). How 15 will improve the result?
The smaller X, the bigger the factor. This will get more precise items itself, but is it still compare-able?

I really don’t want to question your solution. I want to understand the advantages, I can’t see right now. Maybe because I am some kind of blind after a few weeks every night :wink:
The new generation of the db takes 42,842s at my small dev laptop (Intel(R) Core™2 Duo CPU L9400 @ 1.86GHz, 4GB RAM). It is fast enough to test some approaches. The result file is 123,1mb right now. I’ll try to reduce this, as well.

1 Like

Perhaps I misunderstand your end goals.
I think at some point you plan to compare different raw files against each other. If you do that additional accuracy may be useful.

Goals :slight_smile: this is only the preparation of what we (not I!) can do with all the data.
Simplicity is the enemy of functionality. But here we have the ability to get both. Maybe at the cost of storage.

To use your approach, I’ll take a look at:

def convert_raw(btndata,btnname,md5hash,md5count): 
    btndata = btndata.split(' ')
    try:
        btndint = [int(numeric_string) for numeric_string in btndata]
    except:
        cur_execute = ["Nonnumeric value. No import for {} {}".format(md5hash,btnname)]
        return(cur_execute)

    divisor = min(btndint)
    if divisor == 0:
        cur_execute = ["Prevent dividing by zero. No import for {} {}".format(md5hash, btnname)]
        return(cur_execute)
    
    maxdivident = 15000/divisor
    converted = []
    splitlist = []
    cur_execute = []

    # convert values into int by deviding throug smallest value.
    for dividend in btndata:
        factor = int(int(dividend)/int(divisor))
        if factor > maxdivident:
            splitlist.append(factor)
        converted.append(factor)

Instead of divisor = min(btndint), you would use divisor = 15

The result would be

(Fields: ButtonName, SplitValue, Chunk, Sequence, Repeats, md5Hash)

'8h', '2654', '[646, 299, 41, 36, 41, 36, 41, 36, 41, 36, 41, 36, 41, 36, 41, 36, 42, 36, 42, 109, 41, 110, 42, 109, 41, 110, 42, 109, 41, 110, 42, 109, 41, 109, 41, 36, 41, 110, 42, 36, 42, 109, 41, 36, 41, 36, 41, 36, 41, 36, 42, 109, 41, 36, 41, 110, 41, 36, 41, 110, 41, 109, 41, 110, 42, 109, 41]', '0', '0', 'a1d2d51cf447cd0ffe0cf62d655f60cb'
'8h', '6400', '[645, 148, 42]', '1', '0', 'a1d2d51cf447cd0ffe0cf62d655f60cb'
'8h', '6400', '[645, 148, 42]', '3', '1', 'a1d2d51cf447cd0ffe0cf62d655f60cb'

'15', '1000.0', 'a1d2d51cf447cd0ffe0cf62d655f60cb'
'8h', '[2654, 6400, 6400]', 'a1d2d51cf447cd0ffe0cf62d655f60cb'

compared to

'8h', '73', '[17, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 3, 1, 3, 1, 3, 1]', '0', '0', 'a1d2d51cf447cd0ffe0cf62d655f60cb'
'8h', '177', '[17, 4, 1]', '1', '0', 'a1d2d51cf447cd0ffe0cf62d655f60cb''
8h', '177', '[17, 4, 1]', '3', '1', 'a1d2d51cf447cd0ffe0cf62d655f60cb'

'541', '27.726432532347506', 'a1d2d51cf447cd0ffe0cf62d655f60cb'
'8h', '[73, 177, 177]', 'a1d2d51cf447cd0ffe0cf62d655f60cb'

In the end, I could perform the dividing in SQL before output, as well.

1 Like

Perhaps I’m just overthinking the problem as I often do. I really should look at the data some more.

No, I really appreciate your input. I would not say it sounds overthought. Because of so many unfinished ideas in my head, I need to rethink this approach as well.

Maybe I should not divide the values in the DB. If this would happen in the processing, the output would be more flexible. I could go diffuse or accurate, just as needed.
But I lose the calculation power from the database. Right now it seems to be a good start, but need some cutting/processing. Because raw is … not very sorted. Shocking insights :wink:
Maybe at some point a middleware layer makes sense. This is too complex at this stage.

Next Milestone: Drinking a glass of wine with my wife, look into the stars and reorder the following tasks.
One open task is to get in touch with more CIR protocols beside NEC.

2 Likes

A good idea!

Indeed! Ideas in my head, on paper, on the desk, in the mail, and in boxes. I have an eInk screen arriving Friday. What to do with it though? Connect it to Flipper, ESP32, Raspberry PI? Never enough time!