Searchly offers a simple yet efficient crawler that can get data from your website or database and integrate it to a search index on Searchly. Using the crawler requires no extra coding or integration, but in the future it will be exposed as an API as well. What you need to do is to login to Searchly dashboard ,navigate to the Crawlers pane and fill the form as necessary.
The crawler can integrate web pages and MongoDB data sources right now and support for SQL databases will be available very soon.
You need to specify your website address and select an index. The crawled content will be extracted, cleaned and saved to this index.
Excluding pages: In the advanced settings page, you can create filters to exclude or include specific urls.
For example, your website is located at http://www.example.com and you do not want Searchly to index the pages that has the following url: http://www.example.com/Catalog/…….. All you need to to is to add “Catalog” to Url Exclude Filters in the advanced settings tab.
Content Extraction: Web crawler is using boiler pipe for content extraction. By default it will try to locate the article in your web page. Customisation options for this behaviour will be available soon.
Searchly can extract content from html pages and some common file types such as PDFs, DOCs etc..
Crawler Fields: Crawler indices 3 fields ‘url‘,’title‘ and ‘text‘. Respectively url of crawled page, title information and extracted text.
ElasticSearch Mappings: By default, Searchly will create a mapping with random name. If you create index via our dashboard, index settings will contain autocomplete analyser. (For now is edge-ngram.)
Crawled content will have additionally title.autocomplete and text.autocomplete fields.
Limitations: Your crawler’s frequency and capacity might change based on your subscription plan.
Searchly currently supports MongoDB integration. SQL support is on the way and it will be available soon.
To sync your MongoDB collection to Searchly, you need to provide the connection url , collection name, ElasticSearch type and select the destination index. You are encouraged to use a read-only account which is limited to this collection if possible. Keep in mind that all the content in this collection will be synced to your ElasticSearch index and you should not keep sensitive information such as passwords, accounts in this collection. Customization options for syncing a specific subset of the data will be available soon.
You need to provide ElasticSearch mapping name, if this mapping does not exits, it will be auto defined with information of MongoDB data.