Logisland - Extensions
Think of Logisland extensions as your project dependencies. Extensions configure, boot and integrate a framework or technology into your Logisland application. Make sure you've read the component's extension guide to master extensions installation.
processing
AddFields
Add one or more field to records
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.AddFields
ApplyRegexp
This processor is used to create a new set of fields from one field (using regexp).
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.ApplyRegexp
ExpandMapFields
Expands the content of a MAP field to the root.
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.ExpandMapFields
FlatMap
Converts each field records into a single flatten record...
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.FlatMap
GenerateRandomRecord
This is a processor that make random records given an Avro schema
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.GenerateRandomRecord
ModifyId
modify id of records or generate it following defined rules
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.ModifyId
NormalizeFields
Changes the name of a field according to a provided name mapping...
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.NormalizeFields
SelectDistinctRecords
Keep only distinct records based on a given field
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.SelectDistinctRecords
parsing
EvaluateJsonPath
Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to Records Fields depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Field Name into which the result will be placed. The value of the property must be a valid JsonPath expression. A Return Type of 'auto-detect' will make a determination based off the configured destination. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to 'scalar' the Record will be routed to error. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value. If the expression matches nothing, Fields will be created with empty strings as the value
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.EvaluateJsonPath
ParseProperties
Parse a field made of key=value fields separated by spaces a string like "a=1 b=2 c=3" will add a,b & c fields, respectively with values 1,2 & 3 to the current Record
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.ParseProperties
SplitField
This processor is used to create a new set of fields from one field (using split).
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.SplitField
SplitText
This is a processor that is used to split a String into fields according to a given Record mapping
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.SplitText
SplitTextMultiline
No description provided.
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.SplitTextMultiline
SplitTextWithProperties
This is a processor that is used to split a String into fields according to a given Record mapping
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.SplitTextWithProperties
EvaluateXPath
Evaluates one or more XPaths against the content of a record. The results of those XPaths are assigned to new attributes in the records, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The value of the property must be a valid XPath expression. If the expression matches nothing, no attributes is added.
- com.hurence.logisland:logisland-processor-xml:1.2.0
- com.hurence.logisland.processor.xml.EvaluateXPath
datastore
BulkPut
Indexes the content of a Record in a Datastore using bulk processor
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.datastore.BulkPut
MultiGet
Retrieves a content from datastore using datastore multiget queries. Each incoming record contains information regarding the datastore multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) : - collection (String) : name of the datastore collection on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - type (String) : name of the datastore type on which the multiget query will be performed. This field is not mandatory. - ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory. Each outcoming record holds data of one datastore retrieved document. This data is stored in these fields : - collection (same field name as the incoming record) : name of the datastore collection. - type (same field name as the incoming record) : name of the datastore type. - id (same field name as the incoming record) : retrieved document id. - a list of String fields containing : - field name : the retrieved field name - field value : the retrieved field value
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.datastore.MultiGet
BulkAddElasticsearch
Indexes the content of a Record in Elasticsearch using elasticsearch's bulk processor
- com.hurence.logisland:logisland-processor-elasticsearch:1.2.0
- com.hurence.logisland.processor.elasticsearch.BulkAddElasticsearch
FetchHBaseRow
Fetches a row from an HBase table. The Destination property controls whether the cells are added as flow file attributes, or the row is written to the flow file content as JSON. This processor may be used to fetch a fixed row on a interval by specifying the table and row id directly in the processor, or it may be used to dynamically fetch rows by referencing the table and row id from incoming flow files.
- com.hurence.logisland:logisland-processor-hbase:1.2.0
- com.hurence.logisland.processor.hbase.FetchHBaseRow
MultiGetElasticsearch
Retrieves a content indexed in elasticsearch using elasticsearch multiget queries. Each incoming record contains information regarding the elasticsearch multiget query that will be performed. This information is stored in record fields whose names are configured in the plugin properties (see below) : - index (String) : name of the elasticsearch index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - type (String) : name of the elasticsearch type on which the multiget query will be performed. This field is not mandatory. - ids (String) : comma separated list of document ids to fetch. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - includes (String) : comma separated list of patterns to filter in (include) fields to retrieve. Supports wildcards. This field is not mandatory. - excludes (String) : comma separated list of patterns to filter out (exclude) fields to retrieve. Supports wildcards. This field is not mandatory. Each outcoming record holds data of one elasticsearch retrieved document. This data is stored in these fields : - index (same field name as the incoming record) : name of the elasticsearch index. - type (same field name as the incoming record) : name of the elasticsearch type. - id (same field name as the incoming record) : retrieved document id. - a list of String fields containing : * field name : the retrieved field name * field value : the retrieved field value
- com.hurence.logisland:logisland-processor-elasticsearch:1.2.0
- com.hurence.logisland.processor.elasticsearch.MultiGetElasticsearch
PutHBaseCell
Adds the Contents of a Record to HBase as the value of a single cell
- com.hurence.logisland:logisland-processor-hbase:1.2.0
- com.hurence.logisland.processor.hbase.PutHBaseCell
CSVKeyValueCacheService
A cache that store csv lines as records loaded from a file
- com.hurence.logisland:logisland-service-inmemory-cache:1.2.0
- com.hurence.logisland.service.cache.CSVKeyValueCacheService
CassandraControllerService
Provides a controller service that for the moment only allows to bulkput records into cassandra.
- com.hurence.logisland:logisland-service-cassandra-client:1.2.0
- com.hurence.logisland.service.cassandra.CassandraControllerService
Elasticsearch_6_6_2_ClientService
Implementation of ElasticsearchClientService for Elasticsearch 6.6.2.
- com.hurence.logisland:logisland-service-elasticsearch_6_6_2-client:1.2.0
- com.hurence.logisland.service.elasticsearch.Elasticsearch_6_6_2_ClientService
HBase_1_1_2_ClientService
Implementation of HBaseClientService for HBase 1.1.2. This service can be configured by providing a comma-separated list of configuration files, or by specifying values for the other properties. If configuration files are provided, they will be loaded first, and the values of the additional properties will override the values from the configuration files. In addition, any user defined properties on the processor will also be passed to the HBase configuration.
- com.hurence.logisland:logisland-service-hbase_1_1_2-client:1.2.0
- com.hurence.logisland.service.hbase.HBase_1_1_2_ClientService
InfluxDBControllerService
Provides a controller service that for the moment only allows to bulkput records into influxdb.
- com.hurence.logisland:logisland-service-influxdb-client:1.2.0
- com.hurence.logisland.service.influxdb.InfluxDBControllerService
LRUKeyValueCacheService
A controller service for caching data by key value pair with LRU (last recently used) strategy. using LinkedHashMap
- com.hurence.logisland:logisland-service-inmemory-cache:1.2.0
- com.hurence.logisland.service.cache.LRUKeyValueCacheService
MongoDBControllerService
Provides a controller service that wraps most of the functionality of the MongoDB driver.
- com.hurence.logisland:logisland-service-mongodb-client:1.2.0
- com.hurence.logisland.service.mongodb.MongoDBControllerService
RedisKeyValueCacheService
A controller service for caching records by key value pair with LRU (last recently used) strategy. using LinkedHashMap
- com.hurence.logisland:logisland-service-redis:1.2.0
- com.hurence.logisland.redis.service.RedisKeyValueCacheService
Solr_6_6_2_ClientService
Implementation of ElasticsearchClientService for Solr 5.5.5.
- com.hurence.logisland:logisland-service-solr_6_6_2-client:1.2.0
- com.hurence.logisland.service.solr.Solr_6_6_2_ClientService
alerting
CheckAlerts
Add one or more records representing alerts. Using a datastore.
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.alerting.CheckAlerts
CheckThresholds
Compute threshold cross from given formulas. - each dynamic property will return a new record according to the formula definition - the record name will be set to the property name - the record time will be set to the current timestamp
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.alerting.CheckThresholds
security
ParseNetworkPacket
The ParseNetworkPacket processor is the LogIsland entry point to parse network packets captured either off-the-wire (stream mode) or in pcap format (batch mode). In batch mode, the processor decodes the bytes of the incoming pcap record, where a Global header followed by a sequence of [packet header, packet data] pairs are stored. Then, each incoming pcap event is parsed into n packet records. The fields of packet headers are then extracted and made available in dedicated record fields. See the `Capturing Network packets tutorial
- com.hurence.logisland:logisland-processor-cyber-security:1.2.0
- com.hurence.logisland.processor.networkpacket.ParseNetworkPacket
enrichment
ComputeTags
Compute tag cross from given formulas. - each dynamic property will return a new record according to the formula definition - the record name will be set to the property name - the record time will be set to the current timestamp a threshold_cross has the following properties : count, sum, avg, time, duration, value
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.alerting.ComputeTags
EnrichRecords
Enrich input records with content indexed in datastore using multiget queries. Each incoming record must be possibly enriched with information stored in datastore. The plugin properties are :
- es.index (String) : Name of the datastore index on which the multiget query will be performed. This field is mandatory and should not be empty, otherwise an error output record is sent for this specific incoming record. - record.key (String) : Name of the field in the input record containing the id to lookup document in elastic search. This field is mandatory. - es.key (String) : Name of the datastore key on which the multiget query will be performed. This field is mandatory. - includes (ArrayList
- com.hurence.logisland:logisland-processor-common:1.2.0
- com.hurence.logisland.processor.datastore.EnrichRecords
ParseUserAgent
The user-agent processor allows to decompose User-Agent value from an HTTP header into several attributes of interest. There is no standard format for User-Agent strings, hence it is not easily possible to use regexp to handle them. This processor rely on the `YAUAA library
- com.hurence.logisland:logisland-processor-useragent:1.2.0
- com.hurence.logisland.processor.useragent.ParseUserAgent
IpToFqdn
Translates an IP address into a FQDN (Fully Qualified Domain Name). An input field from the record has the IP as value. An new field is created and its value is the FQDN matching the IP address. The resolution mechanism is based on the underlying operating system. The resolution request may take some time, specially if the IP address cannot be translated into a FQDN. For these reasons this processor relies on the logisland cache service so that once a resolution occurs or not, the result is put into the cache. That way, the real request for the same IP is not re-triggered during a certain period of time, until the cache entry expires. This timeout is configurable but by default a request for the same IP is not triggered before 24 hours to let the time to the underlying DNS system to be potentially updated.
- com.hurence.logisland:logisland-processor-enrichment:1.2.0
- com.hurence.logisland.processor.enrichment.IpToFqdn
IpToGeo
Looks up geolocation information for an IP address. The attribute that contains the IP address to lookup must be provided in the **ip.address.field** property. By default, the geo information are put in a hierarchical structure. That is, if the name of the IP field is 'X', then the the geo attributes added by enrichment are added under a father field named X_geo. "_geo" is the default hierarchical suffix that may be changed with the **geo.hierarchical.suffix** property. If one wants to put the geo fields at the same level as the IP field, then the **geo.hierarchical** property should be set to false and then the geo attributes are created at the same level as him with the naming pattern X_geo_
- com.hurence.logisland:logisland-processor-enrichment:1.2.0
- com.hurence.logisland.processor.enrichment.IpToGeo
MaxmindIpToGeoService
Implementation of the IP 2 GEO Service using maxmind lite db file
- com.hurence.logisland:logisland-service-ip-to-geo-maxmind:1.2.0
- com.hurence.logisland.service.iptogeo.maxmind.MaxmindIpToGeoService
analytics
IncrementalWebSession
This processor creates and updates web-sessions based on incoming web-events. Note that both web-sessions and web-events are stored in elasticsearch. Firstly, web-events are grouped by their session identifier and processed in chronological order. Then each web-session associated to each group is retrieved from elasticsearch. In case none exists yet then a new web session is created based on the first web event. The following fields of the newly created web session are set based on the associated web event: session identifier, first timestamp, first visited page. Secondly, once created, or retrieved, the web session is updated by the remaining web-events. Updates have impacts on fields of the web session such as event counter, last visited page, session duration, ... Before updates are actually applied, checks are performed to detect rules that would trigger the creation of a new session: the duration between the web session and the web event must not exceed the specified time-out, the web session and the web event must have timestamps within the same day (at midnight a new web session is created), source of traffic (campaign, ...) must be the same on the web session and the web event. When a breaking rule is detected, a new web session is created with a new session identifier where as remaining web-events still have the original session identifier. The new session identifier is the original session suffixed with the character '#' followed with an incremented counter. This new session identifier is also set on the remaining web-events. Finally when all web events were applied, all web events -potentially modified with a new session identifier- are save in elasticsearch. And web sessions are passed to the next processor. WebSession information are: - first and last visited page - first and last timestamp of processed event - total number of processed events - the userId - a boolean denoting if the web-session is still active or not - an integer denoting the duration of the web-sessions - optional fields that may be retrieved from the processed events
- com.hurence.logisland:logisland-processor-web-analytics:1.2.0
- com.hurence.logisland.processor.webAnalytics.IncrementalWebSession
SetSourceOfTraffic
Compute the source of traffic of a web session. Users arrive at a website or application through a variety of sources, including advertising/paying campaigns, search engines, social networks, referring sites or direct access. When analysing user experience on a webshop, it is crucial to collect, process, and report the campaign and traffic-source data. To compute the source of traffic of a web session, the user has to provide the utm_* related properties if available i-e: **utm_source.field**, **utm_medium.field**, **utm_campaign.field**, **utm_content.field**, **utm_term.field**) , the referer (**referer.field** property) and the first visited page of the session (**first.visited.page.field** property). By default the source of traffic information are placed in a flat structure (specified by the **source_of_traffic.suffix** property with a default value of source_of_traffic). To work properly the SetSourceOfTraffic processor needs to have access to an Elasticsearch index containing a list of the most popular search engines and social networks. The ES index (specified by the **es.index** property) should be structured such that the _id of an ES document MUST be the name of the domain. If the domain is a search engine, the related ES doc MUST have a boolean field (default being search_engine) specified by the property **es.search_engine.field** with a value set to true. If the domain is a social network , the related ES doc MUST have a boolean field (default being social_network) specified by the property **es.social_network.field** with a value set to true.
- com.hurence.logisland:logisland-processor-web-analytics:1.2.0
- com.hurence.logisland.processor.webAnalytics.SetSourceOfTraffic