Cvičenie zo C# a platformy .NET – Huffmanovo kódovanie

Pred nejakým časom sme strávili niekoľko hodín cvičenia zo C# a platformy .NET snahou vytvoriť program implementujúci Huffmanovo kódovanie. A ja som si dnes povedal, že svoju verziu zverejním.

 

Huffmanovo kódovanie je algoritmus na bezstratovú kmpresiu textu. Nebudem to tu teraz rozvádzať do detailov, stačí sa opýtať Googla alebo Wikipedie – niečo určite nájdete.

 

Moja verzia kódu je relatívne dlhá, ale kľúčové časti by mali byť jasné. Tu uvediem len neokomentovanú veriu (kvôli prehľadnosti) a okomentovanú verziu si môžete stiahnuť pomocou odkazu nižšie.

 

Program využíva väčšinou len základné monosti jazyka C#, výnimkou je binárna serializácia. Tento aspekt jazyka je však použitý len jednoducho a nemal by byť problém pochopiť, o čo ide.

 

A tu je už sľúbený kód:

#define ShowEncodingTable
#define SaveWholeFile

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.Serialization;
using System.Text;
using System.Xml;

namespace Huffman
{
    class Program
    {
        static SortedDictionary<char, string> codingtable;

        static void Main(string[] args)
        {
            try
            {
                while (true)
                {
                    Console.Write("Do you want to encode, decode, compare or exit? [e/d/c/x] ");
                    string response = Console.ReadLine();
                    string output = "";
                    string input = "";
                    switch (response.ToLower())
                    {
                        case "e":
                            Console.Write("\n\rDo you want to encode file of text [f/t]? ");
                            response = Console.ReadLine();
                            switch (response.ToLower())
                            {
                                case "f":
                                    Console.WriteLine("\n\rSource file:");
                                    input = Console.ReadLine();
                                    if (File.Exists(input))
                                    {
                                        input = File.ReadAllText(input);
                                    }
                                    else
                                    {
                                        Console.WriteLine("The specified file does not exist! [Press Enter]");
                                        Console.ReadLine();
                                        goto endloop;
                                    }

                                    Console.WriteLine("\n\rDestination (input valid path or the word 'console'):");
                                    output = Console.ReadLine();
                                    break;
                                case "t":
                                    Console.WriteLine("\n\rInput the text to encode (without line breaks):");
                                    input = Console.ReadLine();
                                    Console.WriteLine("\n\rDestination (input valid path or the word 'console'):");
                                    output = Console.ReadLine();
                                    break;
                                default:
                                    goto endloop;
                            }
                            Encode(input, output);
                            Console.WriteLine("\n\rEncoding completed, file saved. [Press Enter]");
                            break;
                        case "d":
                            Console.WriteLine("\n\rSource file:");
                            input = Console.ReadLine();
                            if (!File.Exists(input))
                            {
                                Console.WriteLine("The specified file does not exist! [Press Enter]");
                                Console.ReadLine();
                                goto endloop;
                            }

                            Console.WriteLine("\n\rDestination (input valid path or the word 'console'):");
                            output = Console.ReadLine();
                            switch (output)
                            {
                                case "console":
                                    Console.WriteLine("\n\rDecoded text:");
                                    Console.WriteLine(Decode(input));
                                    Console.WriteLine("\n\rDecoding completed, output displayed. [Press Enter]");
                                    break;
                                default:
                                    File.WriteAllText(output, Decode(input));
                                    break;
                            }

                            break;
                        case "c":
                            Console.WriteLine("\n\rOriginal file:");
                            input = Console.ReadLine();
                            if (!File.Exists(input))
                            {
                                Console.WriteLine("The specified file does not exist! [Press Enter]");
                                Console.ReadLine();
                                goto endloop;
                            }
                            Console.WriteLine("\n\rEncoded file:");
                            output = Console.ReadLine();
                            if (!File.Exists(output))
                            {
                                Console.WriteLine("The specified file does not exist! [Press Enter]");
                                Console.ReadLine();
                                goto endloop;
                            }
                            string origtext = File.ReadAllText(input);
                            int[] compared = Compare(origtext, output);
                            Console.WriteLine(String.Format("\n\rLength of the original text: {0} bytes",compared[0]));
                            Console.WriteLine(String.Format("Length after encoding: {0} bytes\n\r", compared[1]));
                            Console.WriteLine(String.Format("Text compressed by {0} bytes ({1}%)", compared[2], 100 - Math.Round(((double)compared[2] / (double)compared[0]) * 100, 2)));
                            Console.WriteLine("\n\rComparison completed, output displayed. [Press Enter]");
                            break;
                        case "x":
                            Environment.Exit(0);
                            break;
                        default:
                            goto endloop;
                    }
                    Console.ReadLine();
                endloop:
                    Console.Clear();
                }
            }
            catch (IOException ex)
            {
                Console.Clear();
                Console.WriteLine("File IO exception ecnountered:");
                Console.WriteLine(ex.Message);
                Console.ReadLine();
                Environment.Exit(0);
                throw;
            }
            catch (Exception ex)
            {
                Console.Clear();
                Console.WriteLine("Unexpected exception encountered:");
                Console.WriteLine(ex.Message);
                Console.ReadLine();
                Environment.Exit(0);
            }
        }

        private static string Decode(string input)
        {
            FileStream fs = new FileStream(input, FileMode.Open);
            HuffmanReader hr = new HuffmanReader(fs);
            string decoded = hr.Decode();
            fs.Close();
            return decoded;
        }

        private static void Encode(string InputText, string OutputDestination)
        {
            Queue<Uzol> fronta = BuildTree(InputText);
            codingtable = new SortedDictionary<char, string>();
            Uzol rootnode = fronta.Dequeue();
            GetCharCodes(rootnode, "");

#if ShowEncodingTable
            Console.WriteLine("\n\rEncoding table:");
            foreach (char c in codingtable.Keys)
            {
                string s;
                codingtable.TryGetValue(c, out s);
                Console.WriteLine(string.Format("{0} {1}", c, s));
            }
#endif

            HuffmanWriter hw;
            switch (OutputDestination)
            {
                case "console":
                    MemoryStream ms = new MemoryStream();
                    hw = new HuffmanWriter(ms, codingtable, rootnode);
                    foreach (char c in InputText)
                    {
                        hw.Write(c);
                    }
                    hw.Close();
                    Console.WriteLine(string.Format("Encoded text:\n\r{0}", Encoding.Default.GetString(ms.ToArray())));
                    break;
                default:
                    FileStream fs = new FileStream(OutputDestination, FileMode.Create);
                    hw = new HuffmanWriter(fs, codingtable, rootnode);
                    foreach (char c in InputText)
                    {
                        hw.Write(c);
                    }
                    hw.Close();
                    fs.Close();
                    break;
            }
        }

        private static int[] Compare(string OriginalText, string EncodedFilePath)
        {
            FileStream fs = new FileStream(EncodedFilePath, FileMode.Open);
            HuffmanReader hr = new HuffmanReader(fs);
            int[] compared = hr.Compare(OriginalText);
            fs.Close();
            return compared;
        }

        private static Queue<Uzol> BuildTree(string InputText)
        {
            List<Znak> CharList = new List<Znak>();

            int[] Weights = new int[256];

            foreach (Char ch in InputText)
                Weights[(int)ch]++;

            for (int i = 0; i < 256; i++)
            {
                if (Weights[i] > 0)
                    CharList.Add(new Znak((char)i, Weights[i]));
            }

            CharList.Sort(delegate(Znak z1, Znak z2)
            {
                return Comparer<int>.Default.Compare(z1.Count, z2.Count);
            });

            Queue<Uzol> Leaves = new Queue<Uzol>();
            Queue<Uzol> InnerNodes = new Queue<Uzol>();

            for (int j = 0; j < CharList.Count; j++)
            {
                Leaves.Enqueue(new Uzol(CharList[j]));
            }

            Uzol u11 = Leaves.Dequeue();
            Uzol u12 = Leaves.Dequeue();
            Uzol u21 = null;
            Uzol u22 = null;

            InnerNodes.Enqueue(new Uzol(u11, u12));

            while (Leaves.Count + InnerNodes.Count > 1)
            {
                u11 = null;
                u12 = null;
                u21 = null;
                u22 = null;

                if (Leaves.Count > 0)
                {
                    u11 = Leaves.ElementAt(0);
                }
                if (Leaves.Count > 1)
                {
                    u12 = Leaves.ElementAt(1);
                }

                u21 = InnerNodes.ElementAt(0);

                if (InnerNodes.Count > 1)
                {
                    u22 = InnerNodes.ElementAt(1);
                }

                if ((Leaves.Count == 0))
                {
                    u21 = InnerNodes.Dequeue();
                    u22 = InnerNodes.Dequeue();
                    InnerNodes.Enqueue(new Uzol(u21, u22));
                }
                else
                {
                    if (u11.Weigth <= u21.Weigth)
                    {
                        if (u12 != null)
                        {
                            if (u12.Weigth <= u21.Weigth)
                            {
                                u11 = Leaves.Dequeue();
                                u12 = Leaves.Dequeue();
                                InnerNodes.Enqueue(new Uzol(u11, u12));
                            }
                            else
                            {
                                u11 = Leaves.Dequeue();
                                u21 = InnerNodes.Dequeue();
                                InnerNodes.Enqueue(new Uzol(u11, u21));
                            }
                        }
                        else
                        {
                            u11 = Leaves.Dequeue();
                            u21 = InnerNodes.Dequeue();
                            InnerNodes.Enqueue(new Uzol(u11, u21));
                        }
                    }
                    else
                    {
                        if (u22 != null)
                        {
                            if (u11.Weigth <= u22.Weigth)
                            {
                                u21 = InnerNodes.Dequeue();
                                u11 = Leaves.Dequeue();
                                InnerNodes.Enqueue(new Uzol(u21, u11));
                            }
                            else
                            {
                                u21 = InnerNodes.Dequeue();
                                u22 = InnerNodes.Dequeue();
                                InnerNodes.Enqueue(new Uzol(u21, u22));
                            }
                        }
                        else
                        {
                            u21 = InnerNodes.Dequeue();
                            u11 = Leaves.Dequeue();
                            InnerNodes.Enqueue(new Uzol(u21, u11));
                        }
                    }
                }
            }
            return InnerNodes;
        }

        private class HuffmanWriter
        {
            HuffmanFile Result;
            Uzol EncodingTree;
            SortedDictionary<char, string> EncodingTable;
            StringBuilder WriteBuffer = new StringBuilder("");
            List<Byte> OutputBuffer = new List<byte>();
            Stream OutputStream;

            public HuffmanWriter(Stream path, SortedDictionary<char,string> ct, Uzol treeroot)
            {
                OutputStream = path;
                EncodingTree = treeroot;
                EncodingTable = ct;
            }

            public void Write(char c)
            {
                string cd;
                EncodingTable.TryGetValue(c, out cd);
                WriteBuffer.Append(cd);
            }

            public void Close()
            {
                int i = 0;
                if (WriteBuffer.Length > 0)
                {
                    int ByteResult;
                    while (WriteBuffer.Length >= 8 )
                    {
                        ByteResult = 0;
                        for (i = 0; i < 8; i++)
                        {
                            ByteResult = ByteResult + (int)Math.Pow(2, i) * int.Parse(WriteBuffer[7 - i].ToString());
                        }
                        OutputBuffer.Add((byte)ByteResult);
                        WriteBuffer.Remove(0, 8);
                    }
                    ByteResult = 0;
                    for (i = 0; i < WriteBuffer.Length; i++)
                    {
                        ByteResult = ByteResult + (int)Math.Pow(2, i) * int.Parse(WriteBuffer[WriteBuffer.Length - 1 - i].ToString());
                    }
                    if (ByteResult != 0) OutputBuffer.Add((byte)ByteResult);
                }

#if SaveWholeFile

                Result = new HuffmanFile(OutputBuffer.ToArray(), EncodingTree, i);
                System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
                bf.Serialize(OutputStream, Result);
                OutputStream.Close();
#else
                OutputStream.Write(OutputBuffer.ToArray(), 0, OutputBuffer.Count);
                OutputStream.Close();
#endif
            }
        }

        private class HuffmanReader
        {
            StringBuilder ReadBuffer;
            StringBuilder WriteBuffer;
            Uzol EncodingTreeRoot;
            byte[] EncodedBytes;
            int LastByteLength;

            public HuffmanReader(Stream InputStream)
            {
                System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
                MemoryStream ms = new System.IO.MemoryStream();
                HuffmanFile hf = (HuffmanFile)bf.Deserialize(InputStream);
                EncodingTreeRoot = hf.EncodingTreeRoot;
                LastByteLength = hf.LastBytePad;
                EncodedBytes = hf.EncodedText;
            }

            public string Decode()
            {
                ReadBuffer = new StringBuilder("");

                for (int i = 0; i < EncodedBytes.Length - 1; i++)
                {
                    ReadBuffer.Append(ToBin(EncodedBytes[i], 8));
                }
                ReadBuffer.Append(ToBin(EncodedBytes[EncodedBytes.Length - 1], LastByteLength));

                WriteBuffer = new StringBuilder("");

                int index = 0;
                Uzol current = EncodingTreeRoot;

                for (index = 0; index < ReadBuffer.Length; index++)
                {
                    if (!current.IsList)
                    {
                        if (ReadBuffer[index] == '0')
                        {
                            current = current.Left;
                        }
                        else
                        {
                            current = current.Right;
                        }
                    }
                    if (current.IsList)
                    {
                        WriteBuffer.Append(current.Character.Character);
                        current = EncodingTreeRoot;
                    }
                }
                return WriteBuffer.ToString();
            }

            public int[] Compare(string original)
            {
                int[] result = new int[3];

                result[0] = original.Length;
                result[1] = EncodedBytes.Length;
                result[2] = original.Length - EncodedBytes.Length;

                return result;
            }
        }

        static void GetCharCodes(Uzol u, string code)
        {
            if (!u.IsList)
            {
                GetCharCodes(u.Left, code + "0");
                GetCharCodes(u.Right, code + "1");
            }
            else
                codingtable.Add(u.Character.Character, code);
        }

        static string ToBin(int input, int padwidth)
        {
            String result = "";

            int zvysok = 0;
            int cislo = input;

            while (cislo > 0)
            {
                zvysok = cislo % 2;
                cislo = cislo / 2;
                result = zvysok.ToString() + result;
            }

            result = result.PadLeft(padwidth, '0');
            return result;
        }

        [Serializable]
        private class Znak : ISerializable
        {
            char icharacter;

            public char Character
            {
                get { return icharacter; }
                set { icharacter = value; }
            }

            int icount;

            public int Count
            {
                get { return icount; }
                set { icount = value; }
            }

            public Znak(char ch, int c)
            {
                icharacter = ch;
                icount = c;
            }

            protected Znak(SerializationInfo info, StreamingContext context)
            {
                icharacter = info.GetChar("character");
                icount = info.GetInt32("count");
            }

            public  void GetObjectData(SerializationInfo info, StreamingContext context)
            {
                info.AddValue("character", icharacter);
                info.AddValue("count", icount);
            }

        }

        [Serializable]
        class Uzol : ISerializable
        {
            Uzol iLeft;
            Znak iZ;
            int iW;

            public int Weigth
            {
                get { return iW; }
                set { iW = value; }
            }

            public Znak Character
            {
                get { return iZ; }
                set { iZ = value; }
            }

            public Uzol Left
            {
                get { return iLeft; }
                set { iLeft = value; }
            }
            Uzol iRight;

            public Uzol Right
            {
                get { return iRight; }
                set { iRight = value; }
            }
            bool iList;

            public bool IsList
            {
                get { return iList; }
                set { iList = value; }
            }

            public Uzol(Uzol l, Uzol r)
            {
                iLeft = l;
                iRight = r;
                iList = false;
                iW = l.Weigth + r.Weigth;
            }

            public Uzol(Znak z)
            {
                iZ = z;
                iList = true;
                iW = z.Count;
            }

            protected Uzol(SerializationInfo info, StreamingContext context)
            {
                Znak z = new Znak('a',0);
                Uzol u = new Uzol(z);

                iLeft = (Uzol)info.GetValue("left", u.GetType());
                iRight = (Uzol)info.GetValue("right", u.GetType());
                iList = info.GetBoolean("islist");
                iW = info.GetInt32("weight");
                iZ = (Znak)info.GetValue("char", z.GetType());

            }

            public  void GetObjectData(SerializationInfo info, StreamingContext context)
            {
                info.AddValue("left", iLeft);
                info.AddValue("right", iRight);
                info.AddValue("islist", iList);
                info.AddValue("weight", iW);
                info.AddValue("char", iZ);
            }
        }

        [Serializable]
        class HuffmanFile : ISerializable
        {

            byte[] text;

            public byte[] EncodedText
            {
                get { return text; }
                set { text = value; }
            }
            Uzol root;

            public Uzol EncodingTreeRoot
            {
                get { return root; }
                set { root = value; }
            }
            int lastpad;

            public int LastBytePad
            {
                get { return lastpad; }
                set { lastpad = value; }
            }

            public HuffmanFile(byte[] t, Uzol r, int l)
            {
                text = t;
                root = r;
                lastpad = l;
            }

            protected HuffmanFile(SerializationInfo info, StreamingContext context)
            {
                byte[] b = new byte[0];
                Uzol u = new Uzol(new Znak('a', 0));
                text = (byte[])info.GetValue("text", b.GetType());
                root = (Uzol)info.GetValue("root", u.GetType());
                lastpad = info.GetInt32("lastpad");
            }

            public  void GetObjectData(SerializationInfo info, StreamingContext context)
            {
                info.AddValue("text", text);
                info.AddValue("root", root);
                info.AddValue("lastpad", lastpad);
            }
        }
    }
}

Stiahnuť si môžete okomentovanú verziu kódu.

 

(Edit)

 

Po opätovnej kontrole kódu som si všimol malej nepresnosti – pri zápise zakódovaného textu dochádzalo v niektorých prípadoch (keď bola dĺžka zakódovaného textu v bitoch rovná nejakému násobku 8 ) k zápisu nulového znaku na koniec súboru. Odkaz na stiahnutie aj prepis zdrojového kódu som aktualizoval.

Reklamy

2 thoughts on “Cvičenie zo C# a platformy .NET – Huffmanovo kódovanie

  1. Ľuba

    Ahoj,
    našla som tvoj blog o Huffmanovom kódovaní. Mám na teba jednu prosbu. Som tiež študentka na univerzite ŽU, ale študujem humanitný smer na FPV. Na jednom teste nás čaká otázka:
    JE DANE SLOVO “REDUNDANTNY”, KTORE JE ZAKODOVANE V 8-BITOVEJ ANSII TABULKE. O KOLKO BUDE SLOVO KRATSIE, KEBY BOLO ZAKODOVANE HUFFMANOVYM KODOVANIM?
    Verím, že pre teba nebude problém odpovedať na túto otázku. Pomohol by si nám?
    Dakujem aj za svojich spolužiakov.
    Ľuba

Pridaj komentár

Zadajte svoje údaje, alebo kliknite na ikonu pre prihlásenie:

WordPress.com Logo

Na komentovanie používate váš WordPress.com účet. Odhlásiť sa / Zmeniť )

Twitter picture

Na komentovanie používate váš Twitter účet. Odhlásiť sa / Zmeniť )

Facebook photo

Na komentovanie používate váš Facebook účet. Odhlásiť sa / Zmeniť )

Google+ photo

Na komentovanie používate váš Google+ účet. Odhlásiť sa / Zmeniť )

Connecting to %s