Logisland - Extensions

Think of Logisland extensions as your project dependencies. Extensions configure, boot and integrate a framework or technology into your Logisland application. Make sure you've read the component's extension guide to master extensions installation.

processing

AddFields

Add one or more field to records

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.AddFields

ApplyRegexp

This processor is used to create a new set of fields from one field (using regexp).

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.ApplyRegexp

ExpandMapFields

Expands the content of a MAP field to the root.

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.ExpandMapFields

FlatMap

Converts each field records into a single flatten record...

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.FlatMap

GenerateRandomRecord

This is a processor that make random records given an Avro schema

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.GenerateRandomRecord

ModifyId

modify id of records or generate it following defined rules

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.ModifyId

NormalizeFields

Changes the name of a field according to a provided name mapping...

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.NormalizeFields

SelectDistinctRecords

Keep only distinct records based on a given field

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.SelectDistinctRecords

parsing

EvaluateJsonPath

Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to Records Fields depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Field Name into which the result will be placed. The value of the property must be a valid JsonPath expression. A Return Type of 'auto-detect' will make a determination based off the configured destination. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to 'scalar' the Record will be routed to error. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value. If the expression matches nothing, Fields will be created with empty strings as the value

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.EvaluateJsonPath

ParseProperties

Parse a field made of key=value fields separated by spaces a string like "a=1 b=2 c=3" will add a,b & c fields, respectively with values 1,2 & 3 to the current Record

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.ParseProperties

SplitField

This processor is used to create a new set of fields from one field (using split).

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.SplitField

SplitText

This is a processor that is used to split a String into fields according to a given Record mapping

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.SplitText

SplitTextMultiline

No description provided.

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.SplitTextMultiline

SplitTextWithProperties

This is a processor that is used to split a String into fields according to a given Record mapping

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.SplitTextWithProperties

EvaluateXPath

Evaluates one or more XPaths against the content of a record. The results of those XPaths are assigned to new attributes in the records, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The value of the property must be a valid XPath expression. If the expression matches nothing, no attributes is added.

com.hurence.logisland:logisland-processor-xml:1.2.0
com.hurence.logisland.processor.xml.EvaluateXPath

datastore

BulkPut

Indexes the content of a Record in a Datastore using bulk processor

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.datastore.BulkPut

MultiGet

Retrieves a content from datastore using datastore multiget queries. Each incoming record contains information regarding the datastore multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) : - collection (String) : name of the datastore collection on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - type (String) : name of the datastore type on which the multiget query will be performed. This field is not mandatory. - ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory. Each outcoming record holds data of one datastore retrieved document. This data is stored in these fields : - collection (same field name as the incoming record) : name of the datastore collection. - type (same field name as the incoming record) : name of the datastore type. - id (same field name as the incoming record) : retrieved document id. - a list of String fields containing : - field name : the retrieved field name - field value : the retrieved field value

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.datastore.MultiGet

BulkAddElasticsearch

Indexes the content of a Record in Elasticsearch using elasticsearch's bulk processor

com.hurence.logisland:logisland-processor-elasticsearch:1.2.0
com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch

FetchHBaseRow

Fetches a row from an HBase table. The Destination property controls whether the cells are added as flow file attributes, or the row is written to the flow file content as JSON. This processor may be used to fetch a fixed row on a interval by specifying the table and row id directly in the processor, or it may be used to dynamically fetch rows by referencing the table and row id from incoming flow files.

com.hurence.logisland:logisland-processor-hbase:1.2.0
com.hurence.logisland.processor.hbase.FetchHBaseRow

MultiGetElasticsearch

Retrieves a content indexed in elasticsearch using elasticsearch multiget queries. Each incoming record contains information regarding the elasticsearch multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) : - index (String) : name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - type (String) : name of the elasticsearch type on which the multiget query will be performed. This field is not mandatory. - ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory. Each outcoming record holds data of one elasticsearch retrieved document. This data is stored in these fields : - index (same field name as the incoming record) : name of the elasticsearch index. - type (same field name as the incoming record) : name of the elasticsearch type. - id (same field name as the incoming record) : retrieved document id. - a list of String fields containing : * field name : the retrieved field name * field value : the retrieved field value

com.hurence.logisland:logisland-processor-elasticsearch:1.2.0
com.hurence.logisland.processor.elasticsearch.MultiGetElasticsearch

PutHBaseCell

Adds the Contents of a Record to HBase as the value of a single cell

com.hurence.logisland:logisland-processor-hbase:1.2.0
com.hurence.logisland.processor.hbase.PutHBaseCell

CSVKeyValueCacheService

A cache that store csv lines as records loaded from a file

com.hurence.logisland:logisland-service-inmemory-cache:1.2.0
com.hurence.logisland.service.cache.CSVKeyValueCacheService

CassandraControllerService

Provides a controller service that for the moment only allows to bulkput records into cassandra.

com.hurence.logisland:logisland-service-cassandra-client:1.2.0
com.hurence.logisland.service.cassandra.CassandraControllerService

Elasticsearch_6_6_2_ClientService

Implementation of ElasticsearchClientService for Elasticsearch 6.6.2.

com.hurence.logisland:logisland-service-elasticsearch_6_6_2-client:1.2.0
com.hurence.logisland.service.elasticsearch.Elasticsearch_6_6_2_ClientService

HBase_1_1_2_ClientService

Implementation of HBaseClientService for HBase 1.1.2. This service can be configured by providing a comma-separated list of configuration files, or by specifying values for the other properties. If configuration files are provided, they will be loaded first, and the values of the additional properties will override the values from the configuration files. In addition, any user defined properties on the processor will also be passed to the HBase configuration.

com.hurence.logisland:logisland-service-hbase_1_1_2-client:1.2.0
com.hurence.logisland.service.hbase.HBase_1_1_2_ClientService

InfluxDBControllerService

Provides a controller service that for the moment only allows to bulkput records into influxdb.

com.hurence.logisland:logisland-service-influxdb-client:1.2.0
com.hurence.logisland.service.influxdb.InfluxDBControllerService

LRUKeyValueCacheService

A controller service for caching data by key value pair with LRU (last recently used) strategy. using LinkedHashMap

com.hurence.logisland:logisland-service-inmemory-cache:1.2.0
com.hurence.logisland.service.cache.LRUKeyValueCacheService

MongoDBControllerService

Provides a controller service that wraps most of the functionality of the MongoDB driver.

com.hurence.logisland:logisland-service-mongodb-client:1.2.0
com.hurence.logisland.service.mongodb.MongoDBControllerService

RedisKeyValueCacheService

A controller service for caching records by key value pair with LRU (last recently used) strategy. using LinkedHashMap

com.hurence.logisland:logisland-service-redis:1.2.0
com.hurence.logisland.redis.service.RedisKeyValueCacheService

Solr_6_6_2_ClientService

Implementation of ElasticsearchClientService for Solr 5.5.5.

com.hurence.logisland:logisland-service-solr_6_6_2-client:1.2.0
com.hurence.logisland.service.solr.Solr_6_6_2_ClientService

alerting

CheckAlerts

Add one or more records representing alerts. Using a datastore.

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.alerting.CheckAlerts

CheckThresholds

Compute threshold cross from given formulas. - each dynamic property will return a new record according to the formula definition - the record name will be set to the property name - the record time will be set to the current timestamp

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.alerting.CheckThresholds

security

ParseNetworkPacket

The ParseNetworkPacket processor is the LogIsland entry point to parse network packets captured either off-the-wire (stream mode) or in pcap format (batch mode). In batch mode, the processor decodes the bytes of the incoming pcap record, where a Global header followed by a sequence of [packet header, packet data] pairs are stored. Then, each incoming pcap event is parsed into n packet records. The fields of packet headers are then extracted and made available in dedicated record fields. See the `Capturing Network packets tutorial `_ for an example of usage of this processor.

com.hurence.logisland:logisland-processor-cyber-security:1.2.0
com.hurence.logisland.processor.networkpacket.ParseNetworkPacket

enrichment

ComputeTags

Compute tag cross from given formulas. - each dynamic property will return a new record according to the formula definition - the record name will be set to the property name - the record time will be set to the current timestamp a threshold_cross has the following properties : count, sum, avg, time, duration, value

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.alerting.ComputeTags

EnrichRecords

Enrich input records with content indexed in datastore using multiget queries. Each incoming record must be possibly enriched with information stored in datastore. The plugin properties are : - es.index (String) : Name of the datastore index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - record.key (String) : Name of the field in the input record containing the id to lookup document in elastic search. This field is mandatory. - es.key (String) : Name of the datastore key on which the multiget query will be performed. This field is mandatory. - includes (ArrayList) : List of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (ArrayList) : List of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory. Each outcoming record holds at least the input record plus potentially one or more fields coming from of one datastore document.

com.hurence.logisland:logisland-processor-common:1.2.0
com.hurence.logisland.processor.datastore.EnrichRecords

ParseUserAgent

The user-agent processor allows to decompose User-Agent value from an HTTP header into several attributes of interest. There is no standard format for User-Agent strings, hence it is not easily possible to use regexp to handle them. This processor rely on the `YAUAA library `_ to do the heavy work.

com.hurence.logisland:logisland-processor-useragent:1.2.0
com.hurence.logisland.processor.useragent.ParseUserAgent

IpToFqdn

Translates an IP address into a FQDN (Fully Qualified Domain Name). An input field from the record has the IP as value. An new field is created and its value is the FQDN matching the IP address. The resolution mechanism is based on the underlying operating system. The resolution request may take some time, specially if the IP address cannot be translated into a FQDN. For these reasons this processor relies on the logisland cache service so that once a resolution occurs or not, the result is put into the cache. That way, the real request for the same IP is not re-triggered during a certain period of time, until the cache entry expires. This timeout is configurable but by default a request for the same IP is not triggered before 24 hours to let the time to the underlying DNS system to be potentially updated.

com.hurence.logisland:logisland-processor-enrichment:1.2.0
com.hurence.logisland.processor.enrichment.IpToFqdn

IpToGeo

Looks up geolocation information for an IP address. The attribute that contains the IP address to lookup must be provided in the **ip.address.field** property. By default, the geo information are put in a hierarchical structure. That is, if the name of the IP field is 'X', then the the geo attributes added by enrichment are added under a father field named X_geo. "_geo" is the default hierarchical suffix that may be changed with the **geo.hierarchical.suffix** property. If one wants to put the geo fields at the same level as the IP field, then the **geo.hierarchical** property should be set to false and then the geo attributes are created at the same level as him with the naming pattern X_geo_. "_geo_" is the default flat suffix but this may be changed with the **geo.flat.suffix** property. The IpToGeo processor requires a reference to an Ip to Geo service. This must be defined in the **iptogeo.service** property. The added geo fields are dependant on the underlying Ip to Geo service. The **geo.fields** property must contain the list of geo fields that should be created if data is available for the IP to resolve. This property defaults to "*" which means to add every available fields. If one only wants a subset of the fields, one must define a comma separated list of fields as a value for the **geo.fields** property. The list of the available geo fields is in the description of the **geo.fields** property.

com.hurence.logisland:logisland-processor-enrichment:1.2.0
com.hurence.logisland.processor.enrichment.IpToGeo

MaxmindIpToGeoService

Implementation of the IP 2 GEO Service using maxmind lite db file

com.hurence.logisland:logisland-service-ip-to-geo-maxmind:1.2.0
com.hurence.logisland.service.iptogeo.maxmind.MaxmindIpToGeoService

analytics

IncrementalWebSession

This processor creates and updates web-sessions based on incoming web-events. Note that both web-sessions and web-events are stored in elasticsearch. Firstly, web-events are grouped by their session identifier and processed in chronological order. Then each web-session associated to each group is retrieved from elasticsearch. In case none exists yet then a new web session is created based on the first web event. The following fields of the newly created web session are set based on the associated web event: session identifier, first timestamp, first visited page. Secondly, once created, or retrieved, the web session is updated by the remaining web-events. Updates have impacts on fields of the web session such as event counter, last visited page, session duration, ... Before updates are actually applied, checks are performed to detect rules that would trigger the creation of a new session: the duration between the web session and the web event must not exceed the specified time-out, the web session and the web event must have timestamps within the same day (at midnight a new web session is created), source of traffic (campaign, ...) must be the same on the web session and the web event. When a breaking rule is detected, a new web session is created with a new session identifier where as remaining web-events still have the original session identifier. The new session identifier is the original session suffixed with the character '#' followed with an incremented counter. This new session identifier is also set on the remaining web-events. Finally when all web events were applied, all web events -potentially modified with a new session identifier- are save in elasticsearch. And web sessions are passed to the next processor. WebSession information are: - first and last visited page - first and last timestamp of processed event - total number of processed events - the userId - a boolean denoting if the web-session is still active or not - an integer denoting the duration of the web-sessions - optional fields that may be retrieved from the processed events

com.hurence.logisland:logisland-processor-web-analytics:1.2.0
com.hurence.logisland.processor.webAnalytics.IncrementalWebSession

SetSourceOfTraffic

Compute the source of traffic of a web session. Users arrive at a website or application through a variety of sources, including advertising/paying campaigns, search engines, social networks, referring sites or direct access. When analysing user experience on a webshop, it is crucial to collect, process, and report the campaign and traffic-source data. To compute the source of traffic of a web session, the user has to provide the utm_* related properties if available i-e: **utm_source.field**, **utm_medium.field**, **utm_campaign.field**, **utm_content.field**, **utm_term.field**) , the referer (**referer.field** property) and the first visited page of the session (**first.visited.page.field** property). By default the source of traffic information are placed in a flat structure (specified by the **source_of_traffic.suffix** property with a default value of source_of_traffic). To work properly the SetSourceOfTraffic processor needs to have access to an Elasticsearch index containing a list of the most popular search engines and social networks. The ES index (specified by the **es.index** property) should be structured such that the _id of an ES document MUST be the name of the domain. If the domain is a search engine, the related ES doc MUST have a boolean field (default being search_engine) specified by the property **es.search_engine.field** with a value set to true. If the domain is a social network , the related ES doc MUST have a boolean field (default being social_network) specified by the property **es.social_network.field** with a value set to true.

com.hurence.logisland:logisland-processor-web-analytics:1.2.0
com.hurence.logisland.processor.webAnalytics.SetSourceOfTraffic