ChunkedList

A new type of collection is available in a new library: ChunkedList<T>. It was created as a result of searching for processes that could be optimized, and the realization of how poorly a LinkedList<T> performs on modern computers. At the same time, the List<T> structure, while often outperforming LinkedList<T> in many cases, is not desireable. The conclusion was to create a new type of collection that supports both the benefits of using a LinkedList<T>, with the performance of the List<T>, or, as shown below, outperforming both.

ChunkedList Performance

The ChunkedList<T> is a generic data structure that is optimized for performance. It combines features from the LinkedList<T> structure, with features of the List<T> structure.

The LinkedList<T> structure generates nodes for each element, and arranges them in a linked list. It is very straightforward to add and remove elements. But the structure requires a lot of memory overhead, especially if storing small elements. It is also inefficient to process, as elements are not stored closely together, not allowing processor caches to be utilized efficiently.

The List<T> structure on the other hand, stores all elements in a single array internally. This makes access much faster. But adding elements can require the internal buffer to be resized, which can be a costly operation. To avoid this, the size of the increase grows as elements are added (i.e. size is doubled).

The ChunkedList<T> combines the two structures. Instead of attempting to have a single internal array, which requires resizing, the list creates chunks, and links the chunks together in a linked list of chunks. This allows for elements within a chunk to be stored closely together, while avoiding the need to resize the internal buffer, avoiding the necessary copying of data each time that happens.

Following are some performance benchmarks done, comparing the performance of the ChunkedList<T> structure, with the LinkedList<T> and List<T> structures. The ChunkedList<T> structure has comparable interfaces, allowing it to fit in code that uses either of the LinkedList<T> or List<T> structures. You can find the code of the ChunkedList<T> structure in the IoTGateway repository. The same repository also contains unit tests and the following benchmarking tests. You can also use the collection via its nuget.

The following benchmarks were done on a Windows 11 Intel i9 laptop machine, in Release mode. Different performance results may occur on different types of machines, and in different build modes, and for different processors. The benchmarks tests are done first on a small number of elements, to test how the structure works for small number of elements, and then does the same thing for a larger set of elements. The times are then compared to compute a relative performance (in percent) between the ChunkedList<T> and the LinkedList<T> and List<T> structures.

For all graphs, the x-axis represents the number of elements, and the y-axis represents the relative performance, in percent. An additional black line parallel to the x-axis marks the 100% level, where the lists perform equally. If the colored graph lies above this line, the ChunkedList<T> structure performs better than the structure represented by the graph. If the colored graph lies below the line, the ChunkedList<T> structure performs worse. The hidden implementations explain the variations of the curves, as memory exceed internal caches, or buffers have to be resized. The graphs are colored as follows:

Legend
Legend

Adding elements (last)

Simple addition of elements (last in the list):

Adding small amount of elements Adding large amount of elements

Enumerating elements using an enumerator

Enumerating elements in a list sequentially:

Enumerating small amount of elements Enumerating large amount of elements

Enumerating elements using ForEach

Enumerating elements in a list sequentially using ForEach() (not available in LinkedList, so enumeration is performed using an enumerator):

Enumerating small amount of elements using ForEach Enumerating large amount of elements using ForEach

Enumerating elements using ForEachChunk

Enumerating elements in a list sequentially using ForEachChunk(), giving the ChunkedList<T> an edge. (Not available in LinkedList, so enumeration is performed using an enumerator, or List, where enumeration is done using ForEach()):

Enumerating small amount of elements using ForEachChunk Enumerating large amount of elements using ForEachChunk

Enumerating elements using nodes

The LinkedList<T> and ChunkedList<T> permit the enumeration of content using a linked list of node objects that point to each individual element (LinkedList<T>) or each chunk (ChunkedList<T>). This iteration is compared to the ForEach() method of List<T>:

Enumerating small amount of elements using nodes Enumerating large amount of elements using nodes

Checking if list contains en element

The Contains() method is used to check if a list contains an element:

Checking if list contains an element Checking if list contains an element

Removing elements randomly

Removing elements randomly using the Remove() method:

Removing elements randomly Removing elements randomly

Removing first elements

Removing first elements using the RemoveFirst() method (or RemoveAt() for List):

Removing first elements Removing first elements

Note: The ChunkedList<T> structure outperforms the LinkedList<T> structure, even for this type of linked-list operation.

Note 2: The List<T> structure quickly diverges in performance for this operation. It is not suitable for implementing a FIFO queue, for instance.

Removing last elements

Removing last elements using the RemoveLast() method (or RemoveAt() for List):

Removing last elements Removing last elements

Note: The ChunkedList<T> structure outperforms the LinkedList<T> structure, even for this type of linked-list operation.

Adding elements (first)

Simple addition of elements first in the list (or using Insert() for List):

Adding small amount of elements first Adding large amount of elements first

Note: The ChunkedList<T> structure outperforms the LinkedList<T> structure, even for this type of linked-list operation.

Note 2: The List<T> structure quickly diverges in performance for this operation. It is not suitable for implementing queues or priority queues, for instance, where you process items from either end of the list.

Index operations (get/set)

Index operations using this[Index] is not available in the LinkedList<T> structure, so the benchmark comparison is only done between the ChunkedList<T> and List<T> structures:

Index operations Index operations

Finding elements

Finding elements uses the IndexOf method for for ChunkedList<T> and List<T>, and the Find method for LinkedList<T>:

IndexOf operations IndexOf operations

Finding last elements

Finding last elements uses the LastIndexOf method for for ChunkedList<T> and List<T>, and the FindLast method for LinkedList<T>:

LastIndexOf operations LastIndexOf operations

Removing items by index

Removing items from their indices (RemoveAt() method) is not available in LinkedList<T>:

RemoveAt operations RemoveAt operations

Inserting items by index

Inserting items by index (Insert() method) is not available in LinkedList<T>:

Insert operations Insert operations

Adding a range of items

Adding a range of items (AddRange() method) is not available in LinkedList<T>. Both List<T> and ChunkedList<T> structures are optimized for adding ranges originating from arrays. A benchmark of this operation follows first:

Adding a range of items Adding a range of items

A second benchmark is performed when the range is an enumeration not based on an array:

Adding a range of items Adding a range of items

Copying elements to an array

Copying elements to an array using the CopyTo() method:

Copying elements Copying elements

Creating an array of elements

Creating an array containing the elements of a list, using the ToArray() method (CopyTo method for LinkedList):

Creating Array Creating Array

#new, #library


Runtime Language library

The Neuron®, IoT Gateway, and services hosted on them, as well as the TAG Digital ID Apps (including the Neuro-Access App), use the Waher.Runtime.Language library to dynamically publish content using multiple languages. Following is a comparison of this library, compared to the intrinsic language translation method provided by Visual Studio:

  • Translations are seen as dynamic content that can be provided at runtime, by the operator, or by services themselves (which may run in a distributed environment), not embedded static resources created at compile time.
  • New translations can be created at runtime.
  • Languges from multiple services in a distributed environment can be processed collectively.
  • Languages can be extended as new strings appear, as services or network components evolve.
  • The library keeps track of the translation level of each string, making it possible to distinguish between automatically added or translated content, and manually revised translations. This simplifies automated and manually corrected content, to avoid overwriting better translations with newer, but worse translations by mistake.

Basic model

The basic object model of language content is based on a static Translator class, that gives applications access to a set of Languages, Namespaces and Strings. Languages are defined by their ISO Language code, and a displayable name. Namespaces are simply strings identifying a collection of strings. These namespaces can be URLs, or .NET Namespaces, or any other federated identifier you can think of. The strings is a collection of key-value-pairs, where the keys are positive integers, and the values are strings.

Language Object Model, as seen in the Type Explorer
Language Object Model, as seen in the Type Explorer

Translator

The static Translator class gives you access to available languages. You can also dynamically create new languages using this class. It also keeps track of the default language (i.e. language selected as the default language for the service; this is not the same as the language selected by a user using services on the system).

Translator
Translator

Language

Once you have a Language object reference, you can use this object to get access to namespaces or strings directly, using available methods. You can also create new namespaces. To create new strings however, you need access to the corresponding Namespace object reference first.

Language
Language

Namespace

The Namespace object, gives you access to strings in the namespace. You can get, set, create or delete strings, dynamically.

Namespace.png
Namespace.png

LanguageString

Each localized string is implemented as a LanguageString object. Apart from the string ID and Value, it also contains information about its translation level, to be able to distinguish strings that have to be translated, have been automatically translated, and have been manually translated or checked.

LanguageString
LanguageString

TranslationLevel

Each string has a translation level that represents the quality of the string. It is an enumerated valid with the following properties:

TranslationLevel
TranslationLevel

Waher.Utility.Translate

The IoT Gateway, or the TAG Neuron® contains a utility for processing language content, called Waher.Utility.Translate. It is available in the installation folder of the gateway or the Neuron®. This utility can be used to process language information in a Neuron®, or in Visual Studio resource files, including automatically translating strings between languages.

Further reading

A more thorough description of this library, with examples, is provided in Mastering Internet of Things.

#tutorial, #library, #language


Posts tagged #library

No more posts with the given tag could be found. You can go back to the main view by selecting Home in the menu above.