• Import from "@langchain/community/document_loaders/web/sitemap" instead. This entrypoint will be removed in 0.3.0.

Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

interface SitemapLoaderParams {
    chunkSize?: number;
    filterUrls?: (string | RegExp)[];
    selector?: SelectorType;
    textDecoder?: TextDecoder;
    timeout?: number;
}

Hierarchy (view full)

Implemented by

Properties

chunkSize?: number

The size to chunk the sitemap URLs into for scraping.

{300}
filterUrls?: (string | RegExp)[]
selector?: SelectorType

The selector to use to extract the text from the document. Defaults to "body".

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

timeout?: number

The timeout in milliseconds for the fetch request. Defaults to 10s.